This is an R Markdown Notebook for analysis using data on the DC Bus System (WMATA Metrobus). The data were obtained here:
https://planitmetro.com/2016/11/16/data-download-metrobus-vehicle-location-data/
Control + Alt + Shift + m = rename in scope
Load the packages to be used.
package ‘tidyr’ was built under R version 3.2.5package ‘plyr’ was built under R version 3.2.5package ‘stringr’ was built under R version 3.2.5package ‘geosphere’ was built under R version 3.2.5
Get the Bus data.
First let’s check the working directory.
getwd()
[1] "/Users/mdturse/Desktop/Analytics/DCMetroBus"
Then, actually get the data.
The working directory was changed to /Users/mdturse/Desktop/Analytics/DCMetroBus/Bus AVL Oct 2016 inside a notebook chunk. The working directory will be reset when the chunk is finished running. Use the knitr root.dir option in the setup chunk to change the the working directory for notebook chunks.Oct03Raw
'data.frame': 620274 obs. of 17 variables:
$ Bus_ID : int 11 11 11 11 11 11 11 11 11 11 ...
$ Route : Factor w/ 266 levels "10A","10B","10E",..: 224 224 224 224 224 224 224 224 224 224 ...
$ RouteAlt : Factor w/ 14 levels "1","10","11",..: 1 1 1 1 1 1 1 1 6 6 ...
$ Route_Direction : Factor w/ 11 levels "","ANTICLKW",..: 6 6 6 6 6 6 6 6 6 6 ...
$ Stop_Sequence : int 7 7 6 3 2 8 1 1 2 3 ...
$ Stop_ID : Factor w/ 10552 levels "","1000001","1000003",..: 9682 9682 9683 9641 9640 8136 9668 9668 9796 9795 ...
$ Stop_Desc : Factor w/ 7740 levels "10TH ST + MICHIGAN AVE",..: 1346 1346 7417 7418 1346 2940 2939 2939 6926 6929 ...
$ Event_Type : int 4 5 4 4 4 3 3 4 4 4 ...
$ Event_Description: Factor w/ 3 levels "Serviced Stop ",..: 3 2 3 3 3 1 1 3 3 3 ...
$ Event_Time : Factor w/ 75354 levels "10-3-16 1:00:00 AM",..: 47380 47506 47740 47814 47864 48244 48302 48540 49086 49190 ...
$ Departure_Time : Factor w/ 75396 levels "10-3-16 1:00:00 AM",..: 47406 47554 47766 47840 47890 48270 48536 48566 49112 49216 ...
$ Dwell_Time : int 0 11 0 0 0 0 104 0 0 0 ...
$ Delta_Time : int -177 -27 24 165 25 73 719 0 74 76 ...
$ Odometer_Distance: int 43543 43543 45139 46418 50115 51074 51303 53836 55633 56163 ...
$ Latitude : num 38.8 38.8 38.8 38.8 38.8 ...
$ Longitude : num -77.2 -77.2 -77.2 -77.2 -77.2 ...
$ Heading : int 199 253 97 276 15 119 100 89 274 104 ...
Oct04Raw
'data.frame': 623427 obs. of 17 variables:
$ Bus_ID : int 11 11 11 11 11 11 11 11 11 11 ...
$ Route : Factor w/ 266 levels "10A","10B","10E",..: 225 225 225 225 225 225 225 225 225 225 ...
$ RouteAlt : Factor w/ 14 levels "1","10","11",..: 1 1 1 6 6 6 6 6 6 6 ...
$ Route_Direction : Factor w/ 9 levels "ANTICLKW","CLOCKWIS",..: 4 4 4 4 4 4 4 4 4 4 ...
$ Stop_Sequence : int 1 1 8 1 1 1 1 3 2 5 ...
$ Stop_ID : Factor w/ 10555 levels "","1000001","1000003",..: 9671 9671 8138 8138 8138 8138 8138 9798 9799 9638 ...
$ Stop_Desc : Factor w/ 7717 levels "10TH ST + MICHIGAN AVE",..: 2939 2939 2940 2940 2940 2940 2940 6906 6903 4205 ...
$ Event_Type : int 3 4 3 3 3 5 5 4 5 4 ...
$ Event_Description: Factor w/ 3 levels "Serviced Stop ",..: 1 3 1 1 1 2 2 3 2 3 ...
$ Event_Time : Factor w/ 77713 levels "10-4-16 1:00:00 AM",..: 49126 49240 50858 50908 50976 51116 51172 51714 51842 51940 ...
$ Departure_Time : Factor w/ 77739 levels "10-4-16 1:00:00 AM",..: 49209 49251 50869 50957 50987 51165 51185 51725 51933 51951 ...
$ Dwell_Time : int 79 0 59 19 59 19 1 0 40 0 ...
$ Delta_Time : int 35 36 36 246 244 255 264 159 129 139 ...
$ Odometer_Distance: int 56958 60750 69747 69971 69747 71136 71177 76520 77425 78353 ...
$ Latitude : num 38.8 38.8 38.8 38.8 38.8 ...
$ Longitude : num -77.2 -77.2 -77.2 -77.2 -77.2 ...
$ Heading : int 58 89 82 76 79 301 274 104 310 2 ...
Oct05Raw
'data.frame': 630900 obs. of 17 variables:
$ Bus_ID : int 11 11 11 11 11 11 11 11 11 11 ...
$ Route : Factor w/ 266 levels "10A","10B","10E",..: 224 224 224 224 224 224 224 224 224 224 ...
$ RouteAlt : Factor w/ 14 levels "1","10","11",..: 1 1 1 1 1 1 1 1 1 1 ...
$ Route_Direction : Factor w/ 11 levels "ANTICLKW","CLOCKWIS",..: 5 5 5 5 5 5 5 5 5 5 ...
$ Stop_Sequence : int 1 1 3 4 5 4 3 3 7 7 ...
$ Stop_ID : Factor w/ 10543 levels "","1000001","1000003",..: 9659 9659 9632 9633 9624 9633 9632 9632 9673 9673 ...
$ Stop_Desc : Factor w/ 7725 levels "10TH ST + MICHIGAN AVE",..: 2946 2946 7403 7401 7401 7401 7403 7403 1346 1346 ...
$ Event_Type : int 3 5 4 4 3 4 3 3 5 5 ...
$ Event_Description: Factor w/ 3 levels "Serviced Stop ",..: 1 2 3 3 1 3 1 1 2 2 ...
$ Event_Time : Factor w/ 77725 levels "10-5-16 1:00:00 AM",..: 49279 49371 49899 49953 49993 50135 50221 50493 50783 50987 ...
$ Departure_Time : Factor w/ 77716 levels "10-5-16 1:00:00 AM",..: 49257 49353 49877 49931 49997 50113 50421 50489 50767 50999 ...
$ Dwell_Time : int 189 2 0 0 13 0 111 9 3 17 ...
$ Delta_Time : int 4 78 -114 -3 19 93 297 191 382 499 ...
$ Odometer_Distance: int 37932 38703 44242 44327 44645 45927 46733 47077 50461 51916 ...
$ Latitude : num 38.8 38.8 38.8 38.8 38.8 ...
$ Longitude : num -77.2 -77.2 -77.2 -77.2 -77.2 ...
$ Heading : int 316 104 276 246 229 246 345 207 109 305 ...
Oct06Raw
'data.frame': 621948 obs. of 17 variables:
$ Bus_ID : int 11 11 11 11 11 11 11 11 11 11 ...
$ Route : Factor w/ 265 levels "10A","10B","10E",..: 224 224 224 224 224 224 224 224 224 224 ...
$ RouteAlt : Factor w/ 14 levels "1","10","11",..: 1 1 1 1 1 1 1 1 1 1 ...
$ Route_Direction : Factor w/ 9 levels "ANTICLKW","CLOCKWIS",..: 4 4 4 4 4 4 4 4 4 4 ...
$ Stop_Sequence : int 1 1 7 6 5 4 3 7 7 7 ...
$ Stop_ID : Factor w/ 10562 levels "","1000001","1000003",..: 9678 9678 9692 9693 9643 9652 9651 9692 9692 9692 ...
$ Stop_Desc : Factor w/ 7723 levels "10TH ST + MICHIGAN AVE",..: 2937 2937 1342 7400 7399 7399 7401 1342 1342 1342 ...
$ Event_Type : int 3 5 4 5 3 4 3 4 5 5 ...
$ Event_Description: Factor w/ 3 levels "Serviced Stop ",..: 1 2 3 2 1 3 1 3 2 2 ...
$ Event_Time : Factor w/ 77758 levels "10-6-16 1:00:00 AM",..: 49294 49384 49982 49998 50058 50186 50270 50518 51002 51064 ...
$ Departure_Time : Factor w/ 77792 levels "10-6-16 1:00:00 AM",..: 49305 49399 49993 50023 50091 50197 50487 50529 51065 51079 ...
$ Dwell_Time : int 148 2 0 7 11 0 103 0 26 2 ...
$ Delta_Time : int -6 64 -87 -93 31 104 303 175 497 504 ...
$ Odometer_Distance: int 37950 38726 44130 44197 44592 45935 46739 51826 51826 51838 ...
$ Latitude : num 38.8 38.8 38.8 38.8 38.8 ...
$ Longitude : num -77.2 -77.2 -77.2 -77.2 -77.2 ...
$ Heading : int 284 71 199 164 223 246 343 199 306 320 ...
Oct07Raw
'data.frame': 622894 obs. of 17 variables:
$ Bus_ID : int 11 11 11 11 11 11 11 11 11 11 ...
$ Route : Factor w/ 266 levels "10A","10B","10E",..: 224 224 224 224 224 224 224 224 224 224 ...
$ RouteAlt : Factor w/ 14 levels "1","10","11",..: 1 1 6 6 6 6 6 6 7 7 ...
$ Route_Direction : Factor w/ 7 levels "ANTICLKW","CLOCKWIS",..: 4 4 4 4 4 4 4 4 4 4 ...
$ Stop_Sequence : int 1 1 2 3 3 5 6 7 1 2 ...
$ Stop_ID : Factor w/ 10556 levels "","1000001","1000003",..: 9672 9672 9800 9799 9799 9639 9640 9641 9641 9642 ...
$ Stop_Desc : Factor w/ 7699 levels "10TH ST + MICHIGAN AVE",..: 2930 2930 6886 6889 6889 4196 4199 6887 6887 4198 ...
$ Event_Type : int 3 4 3 4 5 4 4 3 3 4 ...
$ Event_Description: Factor w/ 3 levels "Serviced Stop ",..: 1 3 1 3 2 3 3 1 1 3 ...
$ Event_Time : Factor w/ 77562 levels "10-7-16 1:00:00 AM",..: 49134 49136 51718 51756 51888 51934 52018 52044 52130 52288 ...
$ Departure_Time : Factor w/ 77649 levels "10-7-16 1:00:00 AM",..: 49193 49195 51779 51815 51953 51993 52077 52103 52189 52347 ...
$ Dwell_Time : int 153 0 1 0 3 0 0 120 120 0 ...
$ Delta_Time : int 57 56 165 270 197 201 181 189 235 288 ...
$ Odometer_Distance: int 37846 42154 56018 56611 57411 58341 59084 59787 59787 60252 ...
$ Latitude : num 38.8 38.8 38.8 38.8 38.8 ...
$ Longitude : num -77.2 -77.2 -77.2 -77.2 -77.2 ...
$ Heading : int 98 89 201 104 26 2 1 247 247 182 ...
Put the daily data together.
AllDays <- bind_rows(list(Oct03Raw, Oct04Raw, Oct05Raw, Oct06Raw, Oct07Raw),
.id = c("group")
)
Unequal factor levels: coercing to characterUnequal factor levels: coercing to characterUnequal factor levels: coercing to characterUnequal factor levels: coercing to characterUnequal factor levels: coercing to characterUnequal factor levels: coercing to character
# dim(AllDays)
str(AllDays)
'data.frame': 3119443 obs. of 18 variables:
$ group : chr "1" "1" "1" "1" ...
$ Bus_ID : int 11 11 11 11 11 11 11 11 11 11 ...
$ Route : chr "S80" "S80" "S80" "S80" ...
$ RouteAlt : Factor w/ 14 levels "1","10","11",..: 1 1 1 1 1 1 1 1 6 6 ...
$ Route_Direction : chr "LOOP" "LOOP" "LOOP" "LOOP" ...
$ Stop_Sequence : int 7 7 6 3 2 8 1 1 2 3 ...
$ Stop_ID : chr "5004572" "5004572" "5004573" "5002210" ...
$ Stop_Desc : chr "BEULAH ST + CHARLES ARRINGTON DR" "BEULAH ST + CHARLES ARRINGTON DR" "WALKER LN + #6363" "WALKER LN + BEULAH ST" ...
$ Event_Type : int 4 5 4 4 4 3 3 4 4 4 ...
$ Event_Description: Factor w/ 3 levels "Serviced Stop ",..: 3 2 3 3 3 1 1 3 3 3 ...
$ Event_Time : chr "10-3-16 6:06:47 AM" "10-3-16 6:07:50 AM" "10-3-16 6:09:47 AM" "10-3-16 6:10:24 AM" ...
$ Departure_Time : chr "10-3-16 6:06:47 AM" "10-3-16 6:08:01 AM" "10-3-16 6:09:47 AM" "10-3-16 6:10:24 AM" ...
$ Dwell_Time : int 0 11 0 0 0 0 104 0 0 0 ...
$ Delta_Time : int -177 -27 24 165 25 73 719 0 74 76 ...
$ Odometer_Distance: int 43543 43543 45139 46418 50115 51074 51303 53836 55633 56163 ...
$ Latitude : num 38.8 38.8 38.8 38.8 38.8 ...
$ Longitude : num -77.2 -77.2 -77.2 -77.2 -77.2 ...
$ Heading : int 199 253 97 276 15 119 100 89 274 104 ...
Deleting old data frames.
for (i in 3:7){
rm(list = ls(pattern = paste0("Oct0", i, "Raw")
)
)
message("Deleting Oct0", i, "Raw")
}
Deleting Oct03Raw
Deleting Oct04Raw
Deleting Oct05Raw
Deleting Oct06Raw
Deleting Oct07Raw
Updating variable types.
Then, sorting the data and adding a RowNumber (to be used for identifying rows later in the analyses.)
rm(i)
AllDays$group <- factor(AllDays$group)
AllDays$Route_Direction <- factor(AllDays$Route_Direction)
AllDays$Event_Time <- as.POSIXct(AllDays$Event_Time, format = "%m-%d-%y %I:%M:%S %p")
AllDays$Departure_Time <- as.POSIXct(AllDays$Departure_Time, format = "%m-%d-%y %I:%M:%S %p")
str(AllDays)
'data.frame': 3119443 obs. of 18 variables:
$ group : Factor w/ 5 levels "1","2","3","4",..: 1 1 1 1 1 1 1 1 1 1 ...
$ Bus_ID : int 11 11 11 11 11 11 11 11 11 11 ...
$ Route : chr "S80" "S80" "S80" "S80" ...
$ RouteAlt : Factor w/ 14 levels "1","10","11",..: 1 1 1 1 1 1 1 1 6 6 ...
$ Route_Direction : Factor w/ 12 levels "","ANTICLKW",..: 6 6 6 6 6 6 6 6 6 6 ...
$ Stop_Sequence : int 7 7 6 3 2 8 1 1 2 3 ...
$ Stop_ID : chr "5004572" "5004572" "5004573" "5002210" ...
$ Stop_Desc : chr "BEULAH ST + CHARLES ARRINGTON DR" "BEULAH ST + CHARLES ARRINGTON DR" "WALKER LN + #6363" "WALKER LN + BEULAH ST" ...
$ Event_Type : int 4 5 4 4 4 3 3 4 4 4 ...
$ Event_Description: Factor w/ 3 levels "Serviced Stop ",..: 3 2 3 3 3 1 1 3 3 3 ...
$ Event_Time : POSIXct, format: "2016-10-03 06:06:47" "2016-10-03 06:07:50" ...
$ Departure_Time : POSIXct, format: "2016-10-03 06:06:47" "2016-10-03 06:08:01" ...
$ Dwell_Time : int 0 11 0 0 0 0 104 0 0 0 ...
$ Delta_Time : int -177 -27 24 165 25 73 719 0 74 76 ...
$ Odometer_Distance: int 43543 43543 45139 46418 50115 51074 51303 53836 55633 56163 ...
$ Latitude : num 38.8 38.8 38.8 38.8 38.8 ...
$ Longitude : num -77.2 -77.2 -77.2 -77.2 -77.2 ...
$ Heading : int 199 253 97 276 15 119 100 89 274 104 ...
AllDays_Sorted <- arrange(AllDays,
Bus_ID,
Event_Time
) %>%
mutate(RowNum_OG = row_number() # this is useful in identify the row later on
)
rm(AllDays)
str(AllDays_Sorted)
'data.frame': 3119443 obs. of 19 variables:
$ group : Factor w/ 5 levels "1","2","3","4",..: 1 1 1 1 1 1 1 1 1 1 ...
$ Bus_ID : int 11 11 11 11 11 11 11 11 11 11 ...
$ Route : chr "S80" "S80" "S80" "S80" ...
$ RouteAlt : Factor w/ 14 levels "1","10","11",..: 1 1 1 1 1 1 1 1 6 6 ...
$ Route_Direction : Factor w/ 12 levels "","ANTICLKW",..: 6 6 6 6 6 6 6 6 6 6 ...
$ Stop_Sequence : int 7 7 6 3 2 8 1 1 2 3 ...
$ Stop_ID : chr "5004572" "5004572" "5004573" "5002210" ...
$ Stop_Desc : chr "BEULAH ST + CHARLES ARRINGTON DR" "BEULAH ST + CHARLES ARRINGTON DR" "WALKER LN + #6363" "WALKER LN + BEULAH ST" ...
$ Event_Type : int 4 5 4 4 4 3 3 4 4 4 ...
$ Event_Description: Factor w/ 3 levels "Serviced Stop ",..: 3 2 3 3 3 1 1 3 3 3 ...
$ Event_Time : POSIXct, format: "2016-10-03 06:06:47" "2016-10-03 06:07:50" ...
$ Departure_Time : POSIXct, format: "2016-10-03 06:06:47" "2016-10-03 06:08:01" ...
$ Dwell_Time : int 0 11 0 0 0 0 104 0 0 0 ...
$ Delta_Time : int -177 -27 24 165 25 73 719 0 74 76 ...
$ Odometer_Distance: int 43543 43543 45139 46418 50115 51074 51303 53836 55633 56163 ...
$ Latitude : num 38.8 38.8 38.8 38.8 38.8 ...
$ Longitude : num -77.2 -77.2 -77.2 -77.2 -77.2 ...
$ Heading : int 199 253 97 276 15 119 100 89 274 104 ...
$ RowNum_OG : int 1 2 3 4 5 6 7 8 9 10 ...
# View(head(AllDays_Sorted, 100))
Inspecting the values of Stop_ID, and finding that it can take the values “” (blank) and “NULL”.
View(group_by(AllDays_Sorted,
Stop_ID
) %>%
summarise(
Cnt = n()
) %>%
arrange(Stop_ID)
)
View(filter(AllDays_Sorted,
is.na(Stop_ID) |
Stop_ID == "" |
Stop_ID == "NULL"
) %>%
arrange(Stop_Desc)
)
Creating a table of distinct Stop_Desc values when Stop_ID is “” (blank) or “NULL”.
StopID_New <- filter(AllDays_Sorted,
is.na(Stop_ID) |
Stop_ID == "" |
Stop_ID == "NULL"
) %>%
select(Stop_ID, Stop_Desc) %>%
distinct() %>%
arrange(Stop_ID, Stop_Desc) %>%
mutate(StopID_New = 1:nrow(.)
)
View(StopID_New)
Creating a full updated table by filling in StopID_New for when Stop_ID is “” (blank) or NULL.
AllDays_StopIDNew <- left_join(AllDays_Sorted,
select(StopID_New,
Stop_Desc,
StopID_New
),
by = c("Stop_Desc" = "Stop_Desc")
) %>%
mutate(StopID_Clean = ifelse(is.na(StopID_New),
Stop_ID,
StopID_New
),
StopID_Indicator = factor(ifelse(is.na(StopID_New),
"ID_OK",
"ID_Bad"
)
)
)
rm(StopID_New)
rm(AllDays_Sorted)
str(AllDays_StopIDNew)
'data.frame': 3119443 obs. of 22 variables:
$ group : Factor w/ 5 levels "1","2","3","4",..: 1 1 1 1 1 1 1 1 1 1 ...
$ Bus_ID : int 11 11 11 11 11 11 11 11 11 11 ...
$ Route : chr "S80" "S80" "S80" "S80" ...
$ RouteAlt : Factor w/ 14 levels "1","10","11",..: 1 1 1 1 1 1 1 1 6 6 ...
$ Route_Direction : Factor w/ 12 levels "","ANTICLKW",..: 6 6 6 6 6 6 6 6 6 6 ...
$ Stop_Sequence : int 7 7 6 3 2 8 1 1 2 3 ...
$ Stop_ID : chr "5004572" "5004572" "5004573" "5002210" ...
$ Stop_Desc : chr "BEULAH ST + CHARLES ARRINGTON DR" "BEULAH ST + CHARLES ARRINGTON DR" "WALKER LN + #6363" "WALKER LN + BEULAH ST" ...
$ Event_Type : int 4 5 4 4 4 3 3 4 4 4 ...
$ Event_Description: Factor w/ 3 levels "Serviced Stop ",..: 3 2 3 3 3 1 1 3 3 3 ...
$ Event_Time : POSIXct, format: "2016-10-03 06:06:47" "2016-10-03 06:07:50" ...
$ Departure_Time : POSIXct, format: "2016-10-03 06:06:47" "2016-10-03 06:08:01" ...
$ Dwell_Time : int 0 11 0 0 0 0 104 0 0 0 ...
$ Delta_Time : int -177 -27 24 165 25 73 719 0 74 76 ...
$ Odometer_Distance: int 43543 43543 45139 46418 50115 51074 51303 53836 55633 56163 ...
$ Latitude : num 38.8 38.8 38.8 38.8 38.8 ...
$ Longitude : num -77.2 -77.2 -77.2 -77.2 -77.2 ...
$ Heading : int 199 253 97 276 15 119 100 89 274 104 ...
$ RowNum_OG : int 1 2 3 4 5 6 7 8 9 10 ...
$ StopID_New : int NA NA NA NA NA NA NA NA NA NA ...
$ StopID_Clean : chr "5004572" "5004572" "5004573" "5002210" ...
$ StopID_Indicator : Factor w/ 2 levels "ID_Bad","ID_OK": 2 2 2 2 2 2 2 2 2 2 ...
# View(tail(AllDays_StopIDNew, 500))
# View(filter(AllDays_StopIDNew,
# Stop_Desc == "METROWAY ANNNOUCEMNT CORR"
# )
# )
Lat Long stats for pulling in Zip codes later.
LL_Stats <- group_by(AllDays_StopIDNew,
StopID_Clean
) %>%
summarise(Lat_Mean = mean(Latitude, na.rm = TRUE),
Lat_Med = median(Latitude, na.rm = TRUE),
Lng_Mean = mean(Longitude, na.rm = TRUE),
Lng_Med = median(Longitude, na.rm = TRUE)
) %>%
mutate(Lat_MeaLessMed = Lat_Mean - Lat_Med,
Lng_MeaLessMed = Lng_Mean - Lng_Med,
RowNum = row_number()
)
str(LL_Stats)
Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 10588 obs. of 8 variables:
$ StopID_Clean : chr "1" "10" "1000001" "1000003" ...
$ Lat_Mean : num 38.9 39 38.8 38.8 38.8 ...
$ Lat_Med : num 38.9 39 38.8 38.8 38.8 ...
$ Lng_Mean : num -77.1 -77.4 -77 -77 -77 ...
$ Lng_Med : num -77.1 -77.4 -77 -77 -77 ...
$ Lat_MeaLessMed: num 8.37e-05 0.00 -5.40e-06 2.15e-05 9.27e-06 ...
$ Lng_MeaLessMed: num -8.47e-05 0.00 1.07e-05 1.62e-04 -4.82e-07 ...
$ RowNum : int 1 2 3 4 5 6 7 8 9 10 ...
summary(LL_Stats)
StopID_Clean Lat_Mean Lat_Med Lng_Mean
Length:10588 Min. :38.09 Min. :38.60 Min. :-77.45
Class :character 1st Qu.:38.85 1st Qu.:38.85 1st Qu.:-77.10
Mode :character Median :38.90 Median :38.90 Median :-77.01
Mean :38.91 Mean :38.91 Mean :-77.03
3rd Qu.:38.96 3rd Qu.:38.96 3rd Qu.:-76.95
Max. :39.19 Max. :39.19 Max. :-75.42
Lng_Med Lat_MeaLessMed Lng_MeaLessMed RowNum
Min. :-77.45 Min. :-0.8069882 Min. :-0.0102476 Min. : 1
1st Qu.:-77.10 1st Qu.:-0.0000126 1st Qu.:-0.0000167 1st Qu.: 2648
Median :-77.01 Median : 0.0000003 Median :-0.0000003 Median : 5294
Mean :-77.03 Mean :-0.0001867 Mean : 0.0003654 Mean : 5294
3rd Qu.:-76.95 3rd Qu.: 0.0000141 3rd Qu.: 0.0000148 3rd Qu.: 7941
Max. :-76.67 Max. : 0.0093569 Max. : 1.6050662 Max. :10588
View(head(arrange(LL_Stats,
Lat_MeaLessMed
),
500
)
)
View(head(arrange(LL_Stats,
desc(Lat_MeaLessMed)
),
500
)
)
View(head(arrange(LL_Stats,
Lng_MeaLessMed
),
500
)
)
View(head(arrange(LL_Stats,
desc(Lng_MeaLessMed)
),
500
)
)
Pulling in Zip Code data.
# pages <- list()
# pages2 <- list()
# pages3 <- list()
# pages4 <- list()
pages5 <- list()
# pages6 <- list()
# max_row_per_hr <- 1000
# need to group in bunches as http://api.geonames.org limits pulls to 2000 per hour
system.time(
# for(j in 0:5){
# for(k in ((max_row_per_hr*j) + 1):(max_row_per_hr*(j+1)
# )
# ){
#
# }
# }
for(# i in 1:nrow(LL_Stats)
# i in 1:1950
i in 1:1000 # 2051 #3900
# i in 3901:5850
# i in 5851:7800
# i in 7801:9750
# i in 9751:nrow(LL_Stats)
){
lat <- filter(LL_Stats,
RowNum == (i + 1950 + 1000 + 1000 + 1000)
) %>%
select(Lat_Med)
# lat <- LL_Stats_Test[i, Lat_Med]
lng <- filter(LL_Stats,
RowNum == (i + 1950 + 1000 + 1000 + 1000)
) %>%
select(Lng_Med)
# lng <- LL_Stats_Test[i, Lng_Med]
# APIData <- fromJSON(paste0(url_1,
# APIData2 <- fromJSON(paste0(url_1,
# APIData3 <- fromJSON(paste0(url_1,
# APIData4 <- fromJSON(paste0(url_1,
APIData5 <- fromJSON(paste0(url_1,
# APIData6 <- fromJSON(paste0(url_1,
lat,
url_2,
lng,
url_3,
username
),
flatten = TRUE
)
message("Retrieving Zip Code ", (i + 1950 + 1000 + 1000 + 1000)
)
# pages[[i]] <- APIData$postalCodes
# pages2[[i]] <- APIData2$postalCodes
# pages3[[i]] <- APIData3$postalCodes
# pages4[[i]] <- APIData4$postalCodes
pages5[[i]] <- APIData5$postalCodes
# pages6[[i]] <- APIData6$postalCodes
# Sys.sleep(3900)
}
)
Retrieving Zip Code 4951
Retrieving Zip Code 4952
Retrieving Zip Code 4953
Retrieving Zip Code 4954
Retrieving Zip Code 4955
Retrieving Zip Code 4956
Retrieving Zip Code 4957
Retrieving Zip Code 4958
Retrieving Zip Code 4959
Retrieving Zip Code 4960
Retrieving Zip Code 4961
Retrieving Zip Code 4962
Retrieving Zip Code 4963
Retrieving Zip Code 4964
Retrieving Zip Code 4965
Retrieving Zip Code 4966
Retrieving Zip Code 4967
Retrieving Zip Code 4968
Retrieving Zip Code 4969
Retrieving Zip Code 4970
Retrieving Zip Code 4971
Retrieving Zip Code 4972
Retrieving Zip Code 4973
Retrieving Zip Code 4974
Retrieving Zip Code 4975
Retrieving Zip Code 4976
Retrieving Zip Code 4977
Retrieving Zip Code 4978
Retrieving Zip Code 4979
Retrieving Zip Code 4980
Retrieving Zip Code 4981
Retrieving Zip Code 4982
Retrieving Zip Code 4983
Retrieving Zip Code 4984
Retrieving Zip Code 4985
Retrieving Zip Code 4986
Retrieving Zip Code 4987
Retrieving Zip Code 4988
Retrieving Zip Code 4989
Retrieving Zip Code 4990
Retrieving Zip Code 4991
Retrieving Zip Code 4992
Retrieving Zip Code 4993
Retrieving Zip Code 4994
Retrieving Zip Code 4995
Retrieving Zip Code 4996
Retrieving Zip Code 4997
Retrieving Zip Code 4998
Retrieving Zip Code 4999
Retrieving Zip Code 5000
Retrieving Zip Code 5001
Retrieving Zip Code 5002
Retrieving Zip Code 5003
Retrieving Zip Code 5004
Retrieving Zip Code 5005
Retrieving Zip Code 5006
Retrieving Zip Code 5007
Retrieving Zip Code 5008
Retrieving Zip Code 5009
Retrieving Zip Code 5010
Retrieving Zip Code 5011
Retrieving Zip Code 5012
Retrieving Zip Code 5013
Retrieving Zip Code 5014
Retrieving Zip Code 5015
Retrieving Zip Code 5016
Retrieving Zip Code 5017
Retrieving Zip Code 5018
Retrieving Zip Code 5019
Retrieving Zip Code 5020
Retrieving Zip Code 5021
Retrieving Zip Code 5022
Retrieving Zip Code 5023
Retrieving Zip Code 5024
Retrieving Zip Code 5025
Retrieving Zip Code 5026
Retrieving Zip Code 5027
Retrieving Zip Code 5028
Retrieving Zip Code 5029
Retrieving Zip Code 5030
Retrieving Zip Code 5031
Retrieving Zip Code 5032
Retrieving Zip Code 5033
Retrieving Zip Code 5034
Retrieving Zip Code 5035
Retrieving Zip Code 5036
Retrieving Zip Code 5037
Retrieving Zip Code 5038
Retrieving Zip Code 5039
Retrieving Zip Code 5040
Retrieving Zip Code 5041
Retrieving Zip Code 5042
Retrieving Zip Code 5043
Retrieving Zip Code 5044
Retrieving Zip Code 5045
Retrieving Zip Code 5046
Retrieving Zip Code 5047
Retrieving Zip Code 5048
Retrieving Zip Code 5049
Retrieving Zip Code 5050
Retrieving Zip Code 5051
Retrieving Zip Code 5052
Retrieving Zip Code 5053
Retrieving Zip Code 5054
Retrieving Zip Code 5055
Retrieving Zip Code 5056
Retrieving Zip Code 5057
Retrieving Zip Code 5058
Retrieving Zip Code 5059
Retrieving Zip Code 5060
Retrieving Zip Code 5061
Retrieving Zip Code 5062
Retrieving Zip Code 5063
Retrieving Zip Code 5064
Retrieving Zip Code 5065
Retrieving Zip Code 5066
Retrieving Zip Code 5067
Retrieving Zip Code 5068
Retrieving Zip Code 5069
Retrieving Zip Code 5070
Retrieving Zip Code 5071
Retrieving Zip Code 5072
Retrieving Zip Code 5073
Retrieving Zip Code 5074
Retrieving Zip Code 5075
Retrieving Zip Code 5076
Retrieving Zip Code 5077
Retrieving Zip Code 5078
Retrieving Zip Code 5079
Retrieving Zip Code 5080
Retrieving Zip Code 5081
Retrieving Zip Code 5082
Retrieving Zip Code 5083
Retrieving Zip Code 5084
Retrieving Zip Code 5085
Retrieving Zip Code 5086
Retrieving Zip Code 5087
Retrieving Zip Code 5088
Retrieving Zip Code 5089
Retrieving Zip Code 5090
Retrieving Zip Code 5091
Retrieving Zip Code 5092
Retrieving Zip Code 5093
Retrieving Zip Code 5094
Retrieving Zip Code 5095
Retrieving Zip Code 5096
Retrieving Zip Code 5097
Retrieving Zip Code 5098
Retrieving Zip Code 5099
Retrieving Zip Code 5100
Retrieving Zip Code 5101
Retrieving Zip Code 5102
Retrieving Zip Code 5103
Retrieving Zip Code 5104
Retrieving Zip Code 5105
Retrieving Zip Code 5106
Retrieving Zip Code 5107
Retrieving Zip Code 5108
Retrieving Zip Code 5109
Retrieving Zip Code 5110
Retrieving Zip Code 5111
Retrieving Zip Code 5112
Retrieving Zip Code 5113
Retrieving Zip Code 5114
Retrieving Zip Code 5115
Retrieving Zip Code 5116
Retrieving Zip Code 5117
Retrieving Zip Code 5118
Retrieving Zip Code 5119
Retrieving Zip Code 5120
Retrieving Zip Code 5121
Retrieving Zip Code 5122
Retrieving Zip Code 5123
Retrieving Zip Code 5124
Retrieving Zip Code 5125
Retrieving Zip Code 5126
Retrieving Zip Code 5127
Retrieving Zip Code 5128
Retrieving Zip Code 5129
Retrieving Zip Code 5130
Retrieving Zip Code 5131
Retrieving Zip Code 5132
Retrieving Zip Code 5133
Retrieving Zip Code 5134
Retrieving Zip Code 5135
Retrieving Zip Code 5136
Retrieving Zip Code 5137
Retrieving Zip Code 5138
Retrieving Zip Code 5139
Retrieving Zip Code 5140
Retrieving Zip Code 5141
Retrieving Zip Code 5142
Retrieving Zip Code 5143
Retrieving Zip Code 5144
Retrieving Zip Code 5145
Retrieving Zip Code 5146
Retrieving Zip Code 5147
Retrieving Zip Code 5148
Retrieving Zip Code 5149
Retrieving Zip Code 5150
Retrieving Zip Code 5151
Retrieving Zip Code 5152
Retrieving Zip Code 5153
Retrieving Zip Code 5154
Retrieving Zip Code 5155
Retrieving Zip Code 5156
Retrieving Zip Code 5157
Retrieving Zip Code 5158
Retrieving Zip Code 5159
Retrieving Zip Code 5160
Retrieving Zip Code 5161
Retrieving Zip Code 5162
Retrieving Zip Code 5163
Retrieving Zip Code 5164
Retrieving Zip Code 5165
Retrieving Zip Code 5166
Retrieving Zip Code 5167
Retrieving Zip Code 5168
Retrieving Zip Code 5169
Retrieving Zip Code 5170
Retrieving Zip Code 5171
Retrieving Zip Code 5172
Retrieving Zip Code 5173
Retrieving Zip Code 5174
Retrieving Zip Code 5175
Retrieving Zip Code 5176
Retrieving Zip Code 5177
Retrieving Zip Code 5178
Retrieving Zip Code 5179
Retrieving Zip Code 5180
Retrieving Zip Code 5181
Retrieving Zip Code 5182
Retrieving Zip Code 5183
Retrieving Zip Code 5184
Retrieving Zip Code 5185
Retrieving Zip Code 5186
Retrieving Zip Code 5187
Retrieving Zip Code 5188
Retrieving Zip Code 5189
Retrieving Zip Code 5190
Retrieving Zip Code 5191
Retrieving Zip Code 5192
Retrieving Zip Code 5193
Retrieving Zip Code 5194
Retrieving Zip Code 5195
Retrieving Zip Code 5196
Retrieving Zip Code 5197
Retrieving Zip Code 5198
Retrieving Zip Code 5199
Retrieving Zip Code 5200
Retrieving Zip Code 5201
Retrieving Zip Code 5202
Retrieving Zip Code 5203
Retrieving Zip Code 5204
Retrieving Zip Code 5205
Retrieving Zip Code 5206
Retrieving Zip Code 5207
Retrieving Zip Code 5208
Retrieving Zip Code 5209
Retrieving Zip Code 5210
Retrieving Zip Code 5211
Retrieving Zip Code 5212
Retrieving Zip Code 5213
Retrieving Zip Code 5214
Retrieving Zip Code 5215
Retrieving Zip Code 5216
Retrieving Zip Code 5217
Retrieving Zip Code 5218
Retrieving Zip Code 5219
Retrieving Zip Code 5220
Retrieving Zip Code 5221
Retrieving Zip Code 5222
Retrieving Zip Code 5223
Retrieving Zip Code 5224
Retrieving Zip Code 5225
Retrieving Zip Code 5226
Retrieving Zip Code 5227
Retrieving Zip Code 5228
Retrieving Zip Code 5229
Retrieving Zip Code 5230
Retrieving Zip Code 5231
Retrieving Zip Code 5232
Retrieving Zip Code 5233
Retrieving Zip Code 5234
Retrieving Zip Code 5235
Retrieving Zip Code 5236
Retrieving Zip Code 5237
Retrieving Zip Code 5238
Retrieving Zip Code 5239
Retrieving Zip Code 5240
Retrieving Zip Code 5241
Retrieving Zip Code 5242
Retrieving Zip Code 5243
Retrieving Zip Code 5244
Retrieving Zip Code 5245
Retrieving Zip Code 5246
Retrieving Zip Code 5247
Retrieving Zip Code 5248
Retrieving Zip Code 5249
Retrieving Zip Code 5250
Retrieving Zip Code 5251
Retrieving Zip Code 5252
Retrieving Zip Code 5253
Retrieving Zip Code 5254
Retrieving Zip Code 5255
Retrieving Zip Code 5256
Retrieving Zip Code 5257
Retrieving Zip Code 5258
Retrieving Zip Code 5259
Retrieving Zip Code 5260
Retrieving Zip Code 5261
Retrieving Zip Code 5262
Retrieving Zip Code 5263
Retrieving Zip Code 5264
Retrieving Zip Code 5265
Retrieving Zip Code 5266
Retrieving Zip Code 5267
Retrieving Zip Code 5268
Retrieving Zip Code 5269
Retrieving Zip Code 5270
Retrieving Zip Code 5271
Retrieving Zip Code 5272
Retrieving Zip Code 5273
Retrieving Zip Code 5274
Retrieving Zip Code 5275
Retrieving Zip Code 5276
Retrieving Zip Code 5277
Retrieving Zip Code 5278
Retrieving Zip Code 5279
Retrieving Zip Code 5280
Retrieving Zip Code 5281
Retrieving Zip Code 5282
Retrieving Zip Code 5283
Retrieving Zip Code 5284
Retrieving Zip Code 5285
Retrieving Zip Code 5286
Retrieving Zip Code 5287
Retrieving Zip Code 5288
Retrieving Zip Code 5289
Retrieving Zip Code 5290
Retrieving Zip Code 5291
Retrieving Zip Code 5292
Retrieving Zip Code 5293
Retrieving Zip Code 5294
Retrieving Zip Code 5295
Retrieving Zip Code 5296
Retrieving Zip Code 5297
Retrieving Zip Code 5298
Retrieving Zip Code 5299
Retrieving Zip Code 5300
Retrieving Zip Code 5301
Retrieving Zip Code 5302
Retrieving Zip Code 5303
Retrieving Zip Code 5304
Retrieving Zip Code 5305
Retrieving Zip Code 5306
Retrieving Zip Code 5307
Retrieving Zip Code 5308
Retrieving Zip Code 5309
Retrieving Zip Code 5310
Retrieving Zip Code 5311
Retrieving Zip Code 5312
Retrieving Zip Code 5313
Retrieving Zip Code 5314
Retrieving Zip Code 5315
Retrieving Zip Code 5316
Retrieving Zip Code 5317
Retrieving Zip Code 5318
Retrieving Zip Code 5319
Retrieving Zip Code 5320
Retrieving Zip Code 5321
Retrieving Zip Code 5322
Retrieving Zip Code 5323
Retrieving Zip Code 5324
Retrieving Zip Code 5325
Retrieving Zip Code 5326
Retrieving Zip Code 5327
Retrieving Zip Code 5328
Retrieving Zip Code 5329
Retrieving Zip Code 5330
Retrieving Zip Code 5331
Retrieving Zip Code 5332
Retrieving Zip Code 5333
Retrieving Zip Code 5334
Retrieving Zip Code 5335
Retrieving Zip Code 5336
Retrieving Zip Code 5337
Retrieving Zip Code 5338
Retrieving Zip Code 5339
Retrieving Zip Code 5340
Retrieving Zip Code 5341
Retrieving Zip Code 5342
Retrieving Zip Code 5343
Retrieving Zip Code 5344
Retrieving Zip Code 5345
Retrieving Zip Code 5346
Retrieving Zip Code 5347
Retrieving Zip Code 5348
Retrieving Zip Code 5349
Retrieving Zip Code 5350
Retrieving Zip Code 5351
Retrieving Zip Code 5352
Retrieving Zip Code 5353
Retrieving Zip Code 5354
Retrieving Zip Code 5355
Retrieving Zip Code 5356
Retrieving Zip Code 5357
Retrieving Zip Code 5358
Retrieving Zip Code 5359
Retrieving Zip Code 5360
Retrieving Zip Code 5361
Retrieving Zip Code 5362
Retrieving Zip Code 5363
Retrieving Zip Code 5364
Retrieving Zip Code 5365
Retrieving Zip Code 5366
Retrieving Zip Code 5367
Retrieving Zip Code 5368
Retrieving Zip Code 5369
Retrieving Zip Code 5370
Retrieving Zip Code 5371
Retrieving Zip Code 5372
Retrieving Zip Code 5373
Retrieving Zip Code 5374
Retrieving Zip Code 5375
Retrieving Zip Code 5376
Retrieving Zip Code 5377
Retrieving Zip Code 5378
Retrieving Zip Code 5379
Retrieving Zip Code 5380
Retrieving Zip Code 5381
Retrieving Zip Code 5382
Retrieving Zip Code 5383
Retrieving Zip Code 5384
Retrieving Zip Code 5385
Retrieving Zip Code 5386
Retrieving Zip Code 5387
Retrieving Zip Code 5388
Retrieving Zip Code 5389
Retrieving Zip Code 5390
Retrieving Zip Code 5391
Retrieving Zip Code 5392
Retrieving Zip Code 5393
Retrieving Zip Code 5394
Retrieving Zip Code 5395
Retrieving Zip Code 5396
Retrieving Zip Code 5397
Retrieving Zip Code 5398
Retrieving Zip Code 5399
Retrieving Zip Code 5400
Retrieving Zip Code 5401
Retrieving Zip Code 5402
Retrieving Zip Code 5403
Retrieving Zip Code 5404
Retrieving Zip Code 5405
Retrieving Zip Code 5406
Retrieving Zip Code 5407
Retrieving Zip Code 5408
Retrieving Zip Code 5409
Retrieving Zip Code 5410
Retrieving Zip Code 5411
Retrieving Zip Code 5412
Retrieving Zip Code 5413
Retrieving Zip Code 5414
Retrieving Zip Code 5415
Retrieving Zip Code 5416
Retrieving Zip Code 5417
Retrieving Zip Code 5418
Retrieving Zip Code 5419
Retrieving Zip Code 5420
Retrieving Zip Code 5421
Retrieving Zip Code 5422
Retrieving Zip Code 5423
Retrieving Zip Code 5424
Retrieving Zip Code 5425
Retrieving Zip Code 5426
Retrieving Zip Code 5427
Retrieving Zip Code 5428
Retrieving Zip Code 5429
Retrieving Zip Code 5430
Retrieving Zip Code 5431
Retrieving Zip Code 5432
Retrieving Zip Code 5433
Retrieving Zip Code 5434
Retrieving Zip Code 5435
Retrieving Zip Code 5436
Retrieving Zip Code 5437
Retrieving Zip Code 5438
Retrieving Zip Code 5439
Retrieving Zip Code 5440
Retrieving Zip Code 5441
Retrieving Zip Code 5442
Retrieving Zip Code 5443
Retrieving Zip Code 5444
Retrieving Zip Code 5445
Retrieving Zip Code 5446
Retrieving Zip Code 5447
Retrieving Zip Code 5448
Retrieving Zip Code 5449
Retrieving Zip Code 5450
Retrieving Zip Code 5451
Retrieving Zip Code 5452
Retrieving Zip Code 5453
Retrieving Zip Code 5454
Retrieving Zip Code 5455
Retrieving Zip Code 5456
Retrieving Zip Code 5457
Retrieving Zip Code 5458
Retrieving Zip Code 5459
Retrieving Zip Code 5460
Retrieving Zip Code 5461
Retrieving Zip Code 5462
Retrieving Zip Code 5463
Retrieving Zip Code 5464
Retrieving Zip Code 5465
Retrieving Zip Code 5466
Retrieving Zip Code 5467
Retrieving Zip Code 5468
Retrieving Zip Code 5469
Retrieving Zip Code 5470
Retrieving Zip Code 5471
Retrieving Zip Code 5472
Retrieving Zip Code 5473
Retrieving Zip Code 5474
Retrieving Zip Code 5475
Retrieving Zip Code 5476
Retrieving Zip Code 5477
Retrieving Zip Code 5478
Retrieving Zip Code 5479
Retrieving Zip Code 5480
Retrieving Zip Code 5481
Retrieving Zip Code 5482
Retrieving Zip Code 5483
Retrieving Zip Code 5484
Retrieving Zip Code 5485
Retrieving Zip Code 5486
Retrieving Zip Code 5487
Retrieving Zip Code 5488
Retrieving Zip Code 5489
Retrieving Zip Code 5490
Retrieving Zip Code 5491
Retrieving Zip Code 5492
Retrieving Zip Code 5493
Retrieving Zip Code 5494
Retrieving Zip Code 5495
Retrieving Zip Code 5496
Retrieving Zip Code 5497
Retrieving Zip Code 5498
Retrieving Zip Code 5499
Retrieving Zip Code 5500
Retrieving Zip Code 5501
Retrieving Zip Code 5502
Retrieving Zip Code 5503
Retrieving Zip Code 5504
Retrieving Zip Code 5505
Retrieving Zip Code 5506
Retrieving Zip Code 5507
Retrieving Zip Code 5508
Retrieving Zip Code 5509
Retrieving Zip Code 5510
Retrieving Zip Code 5511
Retrieving Zip Code 5512
Retrieving Zip Code 5513
Retrieving Zip Code 5514
Retrieving Zip Code 5515
Retrieving Zip Code 5516
Retrieving Zip Code 5517
Retrieving Zip Code 5518
Retrieving Zip Code 5519
Retrieving Zip Code 5520
Retrieving Zip Code 5521
Retrieving Zip Code 5522
Retrieving Zip Code 5523
Retrieving Zip Code 5524
Retrieving Zip Code 5525
Retrieving Zip Code 5526
Retrieving Zip Code 5527
Retrieving Zip Code 5528
Retrieving Zip Code 5529
Retrieving Zip Code 5530
Retrieving Zip Code 5531
Retrieving Zip Code 5532
Retrieving Zip Code 5533
Retrieving Zip Code 5534
Retrieving Zip Code 5535
Retrieving Zip Code 5536
Retrieving Zip Code 5537
Retrieving Zip Code 5538
Retrieving Zip Code 5539
Retrieving Zip Code 5540
Retrieving Zip Code 5541
Retrieving Zip Code 5542
Retrieving Zip Code 5543
Retrieving Zip Code 5544
Retrieving Zip Code 5545
Retrieving Zip Code 5546
Retrieving Zip Code 5547
Retrieving Zip Code 5548
Retrieving Zip Code 5549
Retrieving Zip Code 5550
Retrieving Zip Code 5551
Retrieving Zip Code 5552
Retrieving Zip Code 5553
Retrieving Zip Code 5554
Retrieving Zip Code 5555
Retrieving Zip Code 5556
Retrieving Zip Code 5557
Retrieving Zip Code 5558
Retrieving Zip Code 5559
Retrieving Zip Code 5560
Retrieving Zip Code 5561
Retrieving Zip Code 5562
Retrieving Zip Code 5563
Retrieving Zip Code 5564
Retrieving Zip Code 5565
Retrieving Zip Code 5566
Retrieving Zip Code 5567
Retrieving Zip Code 5568
Retrieving Zip Code 5569
Retrieving Zip Code 5570
Retrieving Zip Code 5571
Retrieving Zip Code 5572
Retrieving Zip Code 5573
Retrieving Zip Code 5574
Retrieving Zip Code 5575
Retrieving Zip Code 5576
Retrieving Zip Code 5577
Retrieving Zip Code 5578
Retrieving Zip Code 5579
Retrieving Zip Code 5580
Retrieving Zip Code 5581
Retrieving Zip Code 5582
Retrieving Zip Code 5583
Retrieving Zip Code 5584
Retrieving Zip Code 5585
Retrieving Zip Code 5586
Retrieving Zip Code 5587
Retrieving Zip Code 5588
Retrieving Zip Code 5589
Retrieving Zip Code 5590
Retrieving Zip Code 5591
Retrieving Zip Code 5592
Retrieving Zip Code 5593
Retrieving Zip Code 5594
Retrieving Zip Code 5595
Retrieving Zip Code 5596
Retrieving Zip Code 5597
Retrieving Zip Code 5598
Retrieving Zip Code 5599
Retrieving Zip Code 5600
Retrieving Zip Code 5601
Retrieving Zip Code 5602
Retrieving Zip Code 5603
Retrieving Zip Code 5604
Retrieving Zip Code 5605
Retrieving Zip Code 5606
Retrieving Zip Code 5607
Retrieving Zip Code 5608
Retrieving Zip Code 5609
Retrieving Zip Code 5610
Retrieving Zip Code 5611
Retrieving Zip Code 5612
Retrieving Zip Code 5613
Retrieving Zip Code 5614
Retrieving Zip Code 5615
Retrieving Zip Code 5616
Retrieving Zip Code 5617
Retrieving Zip Code 5618
Retrieving Zip Code 5619
Retrieving Zip Code 5620
Retrieving Zip Code 5621
Retrieving Zip Code 5622
Retrieving Zip Code 5623
Retrieving Zip Code 5624
Retrieving Zip Code 5625
Retrieving Zip Code 5626
Retrieving Zip Code 5627
Retrieving Zip Code 5628
Retrieving Zip Code 5629
Retrieving Zip Code 5630
Retrieving Zip Code 5631
Retrieving Zip Code 5632
Retrieving Zip Code 5633
Retrieving Zip Code 5634
Retrieving Zip Code 5635
Retrieving Zip Code 5636
Retrieving Zip Code 5637
Retrieving Zip Code 5638
Retrieving Zip Code 5639
Retrieving Zip Code 5640
Retrieving Zip Code 5641
Retrieving Zip Code 5642
Retrieving Zip Code 5643
Retrieving Zip Code 5644
Retrieving Zip Code 5645
Retrieving Zip Code 5646
Retrieving Zip Code 5647
Retrieving Zip Code 5648
Retrieving Zip Code 5649
Retrieving Zip Code 5650
Retrieving Zip Code 5651
Retrieving Zip Code 5652
Retrieving Zip Code 5653
Retrieving Zip Code 5654
Retrieving Zip Code 5655
Retrieving Zip Code 5656
Retrieving Zip Code 5657
Retrieving Zip Code 5658
Retrieving Zip Code 5659
Retrieving Zip Code 5660
Retrieving Zip Code 5661
Retrieving Zip Code 5662
Retrieving Zip Code 5663
Retrieving Zip Code 5664
Retrieving Zip Code 5665
Retrieving Zip Code 5666
Retrieving Zip Code 5667
Retrieving Zip Code 5668
Retrieving Zip Code 5669
Retrieving Zip Code 5670
Retrieving Zip Code 5671
Retrieving Zip Code 5672
Retrieving Zip Code 5673
Retrieving Zip Code 5674
Retrieving Zip Code 5675
Retrieving Zip Code 5676
Retrieving Zip Code 5677
Retrieving Zip Code 5678
Retrieving Zip Code 5679
Retrieving Zip Code 5680
Retrieving Zip Code 5681
Retrieving Zip Code 5682
Retrieving Zip Code 5683
Retrieving Zip Code 5684
Retrieving Zip Code 5685
Retrieving Zip Code 5686
Retrieving Zip Code 5687
Retrieving Zip Code 5688
Retrieving Zip Code 5689
Retrieving Zip Code 5690
Retrieving Zip Code 5691
Retrieving Zip Code 5692
Retrieving Zip Code 5693
Retrieving Zip Code 5694
Retrieving Zip Code 5695
Retrieving Zip Code 5696
Retrieving Zip Code 5697
Retrieving Zip Code 5698
Retrieving Zip Code 5699
Retrieving Zip Code 5700
Retrieving Zip Code 5701
Retrieving Zip Code 5702
Retrieving Zip Code 5703
Retrieving Zip Code 5704
Retrieving Zip Code 5705
Retrieving Zip Code 5706
Retrieving Zip Code 5707
Retrieving Zip Code 5708
Retrieving Zip Code 5709
Retrieving Zip Code 5710
Retrieving Zip Code 5711
Retrieving Zip Code 5712
Retrieving Zip Code 5713
Retrieving Zip Code 5714
Retrieving Zip Code 5715
Retrieving Zip Code 5716
Retrieving Zip Code 5717
Retrieving Zip Code 5718
Retrieving Zip Code 5719
Retrieving Zip Code 5720
Retrieving Zip Code 5721
Retrieving Zip Code 5722
Retrieving Zip Code 5723
Retrieving Zip Code 5724
Retrieving Zip Code 5725
Retrieving Zip Code 5726
Retrieving Zip Code 5727
Retrieving Zip Code 5728
Retrieving Zip Code 5729
Retrieving Zip Code 5730
Retrieving Zip Code 5731
Retrieving Zip Code 5732
Retrieving Zip Code 5733
Retrieving Zip Code 5734
Retrieving Zip Code 5735
Retrieving Zip Code 5736
Retrieving Zip Code 5737
Retrieving Zip Code 5738
Retrieving Zip Code 5739
Retrieving Zip Code 5740
Retrieving Zip Code 5741
Retrieving Zip Code 5742
Retrieving Zip Code 5743
Retrieving Zip Code 5744
Retrieving Zip Code 5745
Retrieving Zip Code 5746
Retrieving Zip Code 5747
Retrieving Zip Code 5748
Retrieving Zip Code 5749
Retrieving Zip Code 5750
Retrieving Zip Code 5751
Retrieving Zip Code 5752
Retrieving Zip Code 5753
Retrieving Zip Code 5754
Retrieving Zip Code 5755
Retrieving Zip Code 5756
Retrieving Zip Code 5757
Retrieving Zip Code 5758
Retrieving Zip Code 5759
Retrieving Zip Code 5760
Retrieving Zip Code 5761
Retrieving Zip Code 5762
Retrieving Zip Code 5763
Retrieving Zip Code 5764
Retrieving Zip Code 5765
Retrieving Zip Code 5766
Retrieving Zip Code 5767
Retrieving Zip Code 5768
Retrieving Zip Code 5769
Retrieving Zip Code 5770
Retrieving Zip Code 5771
Retrieving Zip Code 5772
Retrieving Zip Code 5773
Retrieving Zip Code 5774
Retrieving Zip Code 5775
Retrieving Zip Code 5776
Retrieving Zip Code 5777
Retrieving Zip Code 5778
Retrieving Zip Code 5779
Retrieving Zip Code 5780
Retrieving Zip Code 5781
Retrieving Zip Code 5782
Retrieving Zip Code 5783
Retrieving Zip Code 5784
Retrieving Zip Code 5785
Retrieving Zip Code 5786
Retrieving Zip Code 5787
Retrieving Zip Code 5788
Retrieving Zip Code 5789
Retrieving Zip Code 5790
Retrieving Zip Code 5791
Retrieving Zip Code 5792
Retrieving Zip Code 5793
Retrieving Zip Code 5794
Retrieving Zip Code 5795
Retrieving Zip Code 5796
Retrieving Zip Code 5797
Retrieving Zip Code 5798
Retrieving Zip Code 5799
Retrieving Zip Code 5800
Retrieving Zip Code 5801
Retrieving Zip Code 5802
Retrieving Zip Code 5803
Retrieving Zip Code 5804
Retrieving Zip Code 5805
Retrieving Zip Code 5806
Retrieving Zip Code 5807
Retrieving Zip Code 5808
Retrieving Zip Code 5809
Retrieving Zip Code 5810
Retrieving Zip Code 5811
Retrieving Zip Code 5812
Retrieving Zip Code 5813
Retrieving Zip Code 5814
Retrieving Zip Code 5815
Retrieving Zip Code 5816
Retrieving Zip Code 5817
Retrieving Zip Code 5818
Retrieving Zip Code 5819
Retrieving Zip Code 5820
Retrieving Zip Code 5821
Retrieving Zip Code 5822
Retrieving Zip Code 5823
Retrieving Zip Code 5824
Retrieving Zip Code 5825
Retrieving Zip Code 5826
Retrieving Zip Code 5827
Retrieving Zip Code 5828
Retrieving Zip Code 5829
Retrieving Zip Code 5830
Retrieving Zip Code 5831
Retrieving Zip Code 5832
Retrieving Zip Code 5833
Retrieving Zip Code 5834
Retrieving Zip Code 5835
Retrieving Zip Code 5836
Retrieving Zip Code 5837
Retrieving Zip Code 5838
Retrieving Zip Code 5839
Retrieving Zip Code 5840
Retrieving Zip Code 5841
Retrieving Zip Code 5842
Retrieving Zip Code 5843
Retrieving Zip Code 5844
Retrieving Zip Code 5845
Retrieving Zip Code 5846
Retrieving Zip Code 5847
Retrieving Zip Code 5848
Retrieving Zip Code 5849
Retrieving Zip Code 5850
Retrieving Zip Code 5851
Retrieving Zip Code 5852
Retrieving Zip Code 5853
Retrieving Zip Code 5854
Retrieving Zip Code 5855
Retrieving Zip Code 5856
Retrieving Zip Code 5857
Retrieving Zip Code 5858
Retrieving Zip Code 5859
Retrieving Zip Code 5860
Retrieving Zip Code 5861
Retrieving Zip Code 5862
Retrieving Zip Code 5863
Retrieving Zip Code 5864
Retrieving Zip Code 5865
Retrieving Zip Code 5866
Retrieving Zip Code 5867
Retrieving Zip Code 5868
Retrieving Zip Code 5869
Retrieving Zip Code 5870
Retrieving Zip Code 5871
Retrieving Zip Code 5872
Retrieving Zip Code 5873
Retrieving Zip Code 5874
Retrieving Zip Code 5875
Retrieving Zip Code 5876
Retrieving Zip Code 5877
Retrieving Zip Code 5878
Retrieving Zip Code 5879
Retrieving Zip Code 5880
Retrieving Zip Code 5881
Retrieving Zip Code 5882
Retrieving Zip Code 5883
Retrieving Zip Code 5884
Retrieving Zip Code 5885
Retrieving Zip Code 5886
Retrieving Zip Code 5887
Retrieving Zip Code 5888
Retrieving Zip Code 5889
Retrieving Zip Code 5890
Retrieving Zip Code 5891
Retrieving Zip Code 5892
Retrieving Zip Code 5893
Retrieving Zip Code 5894
Retrieving Zip Code 5895
Retrieving Zip Code 5896
Retrieving Zip Code 5897
Retrieving Zip Code 5898
Retrieving Zip Code 5899
Retrieving Zip Code 5900
Retrieving Zip Code 5901
Retrieving Zip Code 5902
Retrieving Zip Code 5903
Retrieving Zip Code 5904
Retrieving Zip Code 5905
Retrieving Zip Code 5906
Retrieving Zip Code 5907
Retrieving Zip Code 5908
Retrieving Zip Code 5909
Retrieving Zip Code 5910
Retrieving Zip Code 5911
Retrieving Zip Code 5912
Retrieving Zip Code 5913
Retrieving Zip Code 5914
Retrieving Zip Code 5915
Retrieving Zip Code 5916
Retrieving Zip Code 5917
Retrieving Zip Code 5918
Retrieving Zip Code 5919
Retrieving Zip Code 5920
Retrieving Zip Code 5921
Retrieving Zip Code 5922
Retrieving Zip Code 5923
Retrieving Zip Code 5924
Retrieving Zip Code 5925
Retrieving Zip Code 5926
Retrieving Zip Code 5927
Retrieving Zip Code 5928
Retrieving Zip Code 5929
Retrieving Zip Code 5930
Retrieving Zip Code 5931
Retrieving Zip Code 5932
Retrieving Zip Code 5933
Retrieving Zip Code 5934
Retrieving Zip Code 5935
Retrieving Zip Code 5936
Retrieving Zip Code 5937
Retrieving Zip Code 5938
Retrieving Zip Code 5939
Retrieving Zip Code 5940
Retrieving Zip Code 5941
Retrieving Zip Code 5942
Retrieving Zip Code 5943
Retrieving Zip Code 5944
Retrieving Zip Code 5945
Retrieving Zip Code 5946
Retrieving Zip Code 5947
Retrieving Zip Code 5948
Retrieving Zip Code 5949
Retrieving Zip Code 5950
user system elapsed
48.477 159.656 232.834
Feature engineering.
Inspecting incidences of consecutive Stop_IDs. This is done because investigation showed that many conseutive events occurr at the same Stop_ID, but with various Dwell_Times, Odometer_Distances, etc. All of which affect calculations and analyses.
Create data on the runs (consecutive Stop_IDs).
StopID_Runs <- rle(AllDays_StopIDNew$StopID_Clean)
StopID_Runs$ends <- cumsum(StopID_Runs$lengths)
StopID_Runs$starts <- ifelse(is.na(lag(StopID_Runs$ends)
),
1,
lag(StopID_Runs$ends) + 1
)
str(StopID_Runs)
List of 4
$ lengths: int [1:2809529] 2 1 1 1 1 2 1 1 1 1 ...
$ values : chr [1:2809529] "5004572" "5004573" "5002210" "5002209" ...
$ ends : int [1:2809529] 2 3 4 5 6 8 9 10 11 12 ...
$ starts : num [1:2809529] 1 3 4 5 6 7 9 10 11 12 ...
- attr(*, "class")= chr "rle"
# class(StopID_Runs)
#
# StopID_Runs_df <- data.frame(unclass(StopID_Runs))
# str(StopID_Runs_df)
# class(StopID_Runs_df)
# rm(StopID_Runs_df)
Trying to link data on RunsGroups with the original data (AllDays_Sorted). The goal is to select only one record per RunsGroup - that being the record with the longest Dwell_Time.
I attempted this computation using both data.frames (dplyr) and data.tables (data.table). However, with 2,809,062 rows in one dataset and 3,119,443 rows in the other dataset, the current computation time is over 5 days…so I’m trying a different strategy to only select the first record in a run.
# Create a RunsGroup variable for each run
# StopID_Runs_df$RunsGroup <- paste0("g", seq(1:nrow(StopID_Runs_df)
# )
# )
#
# str(StopID_Runs_df)
# head(StopID_Runs_df, 25)
# tail(StopID_Runs_df, 25)
#
# StopID_Runs_df <- StopID_Runs_df %>%
# mutate(RowNum = row_number()
# )
#
# str(StopID_Runs_df)
# head(StopID_Runs_df, 25)
# tail(StopID_Runs_df, 25)
#
#
# # Converting to data.tables for, hopefully, improved performance (speed) in computation
# StopID_Runs_dt <- data.table(StopID_Runs_df)
# setkey(StopID_Runs_dt, RowNum)
# str(StopID_Runs_dt)
#
# AllDays_Sorted_dt <- data.table(AllDays_Sorted)
# setkey(AllDays_Sorted_dt, RowNum_OG)
# str(AllDays_Sorted_dt)
# # rm(AllDays_Sorted_dt)
#
#
# # Actual loop to perform the computations and link to original data (AllDays_Sorted_dt)
# GroupData <- list()
# for(i in 1:nrow(StopID_Runs_dt)
# ) {
# assign(paste0("group_", i),
# StopID_Runs_dt[RowNum == i, RunsGroup]
# )
#
# ##### The code below is the same code as above, but done with dplyr #####
#
# # assign(paste0("group_", i),
# # filter(StopID_Runs_df,
# # RowNum == i
# # ) %>%
# # select(RunsGroup)
# # )
#
# assign(paste0("group_", i, "_start"),
# StopID_Runs_dt[RowNum == i, starts]
# )
#
# assign(paste0("group_", i, "_end"),
# StopID_Runs_dt[RowNum == i, ends]
# )
#
# assign(paste0("group_", i, "_rows"),
# AllDays_Sorted_dt[RowNum_OG >= as.numeric(get(paste0("group_", i, "_start")
# )
# ) &
# RowNum_OG <= as.numeric(get(paste0("group_", i, "_end")
# )
# ),
# RunsGroup := as.character(get(paste0("group_", i)
# )
# )
# ]
#
# ##### The code below is the same as the code above, but done with dplyr #####
#
# # filter(AllDays_Sorted,
# # between(RowNum_OG,
# # as.numeric(get(paste0("group_", i, "_start")
# # )
# # ),
# # as.numeric(get(paste0("group_", i, "_end")
# # )
# # )
# # )
# # ) %>%
# # mutate(RunsGroup = as.character(get(paste0("group_", i)
# # )
# # )
# # )
# )
#
# GroupData[[i]] <- get(paste0("group_", i, "_rows"))
#
# message("Processing Group ", i, " of 2,809,062")
# }
#
#
# GroupData_df <- rbind.fill(GroupData)
# str(GroupData_df)
# head(GroupData_df)
# tail(GroupData_df)
# # rm(GroupData_df)
#
#
# group_1
# group_1_start
# group_1_end
# group_1_rows
# group_2_rows
# group_3_rows
# group_50_rows
# str(group_50_rows)
# group_2809062_rows
# GroupData[[1]]
# GroupData[[50]]
#
#
# ##### Testing Area (Below) #####
# ##### Testing Area (Below) #####
# ##### Testing Area (Below) #####
#
# # head(StopID_Runs$starts, 20)
# # head(AllDays_NewOrder$Stop_ID, 20)
# #
# #
# # dat <- as.data.frame(c(1,1,7,7,7,9,6,8,2,2,2,1,1,1,1,1))
# # colnames(dat)[1] <- "dat"
# # r <- rle(dat$dat)
# # dat$run <- rep(r$lengths, r$lengths)
# # dat$runLag <- lag(dat$run)
# # dat$cond <- rep(r$values, r$lengths)
# # dat
# # View(dat)
When consecutive Stop_ID occurrs, only take the first occurrence. This is done because the computation time to select only the record with the longest Dwell_Time for each run was too long (over 5 days).
This is probably less than ideal with regards to Dwell_Time, but should not make much difference for calculations of travel time, speed, etc.
AllDays_FirstStopID <- AllDays_StopIDNew[StopID_Runs$starts, ]
dim(AllDays_StopIDNew)
[1] 3119443 22
dim(AllDays_FirstStopID)
[1] 2809529 22
nrow(AllDays_StopIDNew) - nrow(AllDays_FirstStopID)
[1] 309914
rm(AllDays_StopIDNew)
rm(StopID_Runs)
str(AllDays_FirstStopID)
'data.frame': 2809529 obs. of 22 variables:
$ group : Factor w/ 5 levels "1","2","3","4",..: 1 1 1 1 1 1 1 1 1 1 ...
$ Bus_ID : int 11 11 11 11 11 11 11 11 11 11 ...
$ Route : chr "S80" "S80" "S80" "S80" ...
$ RouteAlt : Factor w/ 14 levels "1","10","11",..: 1 1 1 1 1 1 6 6 6 6 ...
$ Route_Direction : Factor w/ 12 levels "","ANTICLKW",..: 6 6 6 6 6 6 6 6 6 6 ...
$ Stop_Sequence : int 7 6 3 2 8 1 2 3 4 2 ...
$ Stop_ID : chr "5004572" "5004573" "5002210" "5002209" ...
$ Stop_Desc : chr "BEULAH ST + CHARLES ARRINGTON DR" "WALKER LN + #6363" "WALKER LN + BEULAH ST" "BEULAH ST + CHARLES ARRINGTON DR" ...
$ Event_Type : int 4 4 4 4 3 3 4 4 4 4 ...
$ Event_Description: Factor w/ 3 levels "Serviced Stop ",..: 3 3 3 3 1 1 3 3 3 3 ...
$ Event_Time : POSIXct, format: "2016-10-03 06:06:47" "2016-10-03 06:09:47" ...
$ Departure_Time : POSIXct, format: "2016-10-03 06:06:47" "2016-10-03 06:09:47" ...
$ Dwell_Time : int 0 0 0 0 0 104 0 0 0 0 ...
$ Delta_Time : int -177 24 165 25 73 719 74 76 63 69 ...
$ Odometer_Distance: int 43543 45139 46418 50115 51074 51303 55633 56163 56285 57262 ...
$ Latitude : num 38.8 38.8 38.8 38.8 38.8 ...
$ Longitude : num -77.2 -77.2 -77.2 -77.2 -77.2 ...
$ Heading : int 199 97 276 15 119 100 274 104 241 274 ...
$ RowNum_OG : int 1 3 4 5 6 7 9 10 11 12 ...
$ StopID_New : int NA NA NA NA NA NA NA NA NA NA ...
$ StopID_Clean : chr "5004572" "5004573" "5002210" "5002209" ...
$ StopID_Indicator : Factor w/ 2 levels "ID_Bad","ID_OK": 2 2 2 2 2 2 2 2 2 2 ...
Feature engineering.
Creating new variables.
AllDays_AddVars <- mutate(AllDays_FirstStopID,
Odometer_Distance_Mi = Odometer_Distance / 5280, #5,280 feet in 1 mile
Dwell_Time2 = as.numeric(Departure_Time - Event_Time),
Event_Time_Yr = as.integer(year(Event_Time)),
Event_Time_Mth = as.integer(month(Event_Time)),
Event_Time_Date = day(Event_Time),
Event_Time_Day = wday(Event_Time, label = TRUE),
Event_Time_Hr = hour(Event_Time),
Event_Time_Min = minute(Event_Time),
Event_Time_HrGroup = factor(ifelse(Event_Time_Hr < 3,
"Group0_2",
ifelse(Event_Time_Hr < 6,
"Group3_5",
ifelse(Event_Time_Hr < 9,
"Group6_8",
ifelse(Event_Time_Hr < 12,
"Group9_11",
ifelse(Event_Time_Hr < 15,
"Group12_14",
ifelse(Event_Time_Hr < 18,
"Group15_17",
ifelse(Event_Time_Hr < 21,
"Group18_20",
ifelse(Event_Time_Hr < 24,
"Group21_23"
)))))))),
levels = c("Group0_2",
"Group3_5",
"Group6_8",
"Group9_11",
"Group12_14",
"Group15_17",
"Group18_20",
"Group21_23"
),
ordered = TRUE
)
)
rm(AllDays_FirstStopID)
str(AllDays_AddVars)
'data.frame': 2809529 obs. of 31 variables:
$ group : Factor w/ 5 levels "1","2","3","4",..: 1 1 1 1 1 1 1 1 1 1 ...
$ Bus_ID : int 11 11 11 11 11 11 11 11 11 11 ...
$ Route : chr "S80" "S80" "S80" "S80" ...
$ RouteAlt : Factor w/ 14 levels "1","10","11",..: 1 1 1 1 1 1 6 6 6 6 ...
$ Route_Direction : Factor w/ 12 levels "","ANTICLKW",..: 6 6 6 6 6 6 6 6 6 6 ...
$ Stop_Sequence : int 7 6 3 2 8 1 2 3 4 2 ...
$ Stop_ID : chr "5004572" "5004573" "5002210" "5002209" ...
$ Stop_Desc : chr "BEULAH ST + CHARLES ARRINGTON DR" "WALKER LN + #6363" "WALKER LN + BEULAH ST" "BEULAH ST + CHARLES ARRINGTON DR" ...
$ Event_Type : int 4 4 4 4 3 3 4 4 4 4 ...
$ Event_Description : Factor w/ 3 levels "Serviced Stop ",..: 3 3 3 3 1 1 3 3 3 3 ...
$ Event_Time : POSIXct, format: "2016-10-03 06:06:47" "2016-10-03 06:09:47" ...
$ Departure_Time : POSIXct, format: "2016-10-03 06:06:47" "2016-10-03 06:09:47" ...
$ Dwell_Time : int 0 0 0 0 0 104 0 0 0 0 ...
$ Delta_Time : int -177 24 165 25 73 719 74 76 63 69 ...
$ Odometer_Distance : int 43543 45139 46418 50115 51074 51303 55633 56163 56285 57262 ...
$ Latitude : num 38.8 38.8 38.8 38.8 38.8 ...
$ Longitude : num -77.2 -77.2 -77.2 -77.2 -77.2 ...
$ Heading : int 199 97 276 15 119 100 274 104 241 274 ...
$ RowNum_OG : int 1 3 4 5 6 7 9 10 11 12 ...
$ StopID_New : int NA NA NA NA NA NA NA NA NA NA ...
$ StopID_Clean : chr "5004572" "5004573" "5002210" "5002209" ...
$ StopID_Indicator : Factor w/ 2 levels "ID_Bad","ID_OK": 2 2 2 2 2 2 2 2 2 2 ...
$ Odometer_Distance_Mi: num 8.25 8.55 8.79 9.49 9.67 ...
$ Dwell_Time2 : num 0 0 0 0 0 104 0 0 0 0 ...
$ Event_Time_Yr : int 2016 2016 2016 2016 2016 2016 2016 2016 2016 2016 ...
$ Event_Time_Mth : int 10 10 10 10 10 10 10 10 10 10 ...
$ Event_Time_Date : int 3 3 3 3 3 3 3 3 3 3 ...
$ Event_Time_Day : Ord.factor w/ 7 levels "Sun"<"Mon"<"Tues"<..: 2 2 2 2 2 2 2 2 2 2 ...
$ Event_Time_Hr : int 6 6 6 6 6 6 6 6 6 6 ...
$ Event_Time_Min : int 6 9 10 10 13 14 21 21 23 23 ...
$ Event_Time_HrGroup : Ord.factor w/ 8 levels "Group0_2"<"Group3_5"<..: 3 3 3 3 3 3 3 3 3 3 ...
# group_by(AllDays_AddVars,
# Event_Time_HrGroup
# ) %>%
# summarise(Cnts = n()
# )
# View(head(filter(AllDays_AddVars,
# Event_Time_Hr == 0
# ),
# 50
# )
# )
# View(head(AllDays_AddVars, 50))
Feature engineering.
Creating more variables. Creating a BusEvent row number for future identification purposes. Then, creating various variables to analyze distance traveled and speed.
AllDays_BusDay <- group_by(AllDays_AddVars,
Bus_ID,
Event_Time_Date
) %>%
mutate(BusDay_EventNum = row_number(), # used to identify Bus movements on a particular date
Route_Lag1 = lag(Route), # used in future analyses to identify Route changes
RouteAlt_Lag1 = lag(RouteAlt), # used in future analyses to identify RouteAlt (direction) changes
Odometer_Distance_Lag1 = lag(Odometer_Distance),
Latitude_L1 = lag(Latitude),
Longitude_L1 = lag(Longitude),
# Lat_Radian = Latitude*pi/180,
# Long_Radian = Longitude*pi/180,
# Lat_Radian_L1 = lag(Lat_Radian),
# Long_Radian_L1 = lag(Long_Radian),
# accounting for potential negative distances
TravelDistance_Ft = ifelse(Odometer_Distance > Odometer_Distance_Lag1,
Odometer_Distance - Odometer_Distance_Lag1,
NA
),
TravelDistance_Mi = TravelDistance_Ft / 5280, #5,280 feet in 1 mile
# TravelDistance_Mi2 = gcd.hf(long1 = Long_Radian_L1,
# lat1 = Lat_Radian_L1,
# long2 = Long_Radian,
# lat2 = Lat_Radian
# ),
TravelDistance_Mi_Hvrs =
# ifelse((is.na(Longitude_L1) | is.na(Latitude_L1)
# ),
# NA,
distHaversine(cbind(Longitude_L1, Latitude_L1),
cbind(Longitude, Latitude)
) * 0.000621371, # 0.000621371 miles = 1 meter
# accounting for potential negative times
TravelTime_Sec = as.numeric(ifelse(Event_Time > lag(Departure_Time),
Event_Time - lag(Departure_Time),
NA
)
),
TravelTime_Hr = TravelTime_Sec / 3600, # 3,600 seconds in 1 hour
# accounting for potential negative or zero travel times
SpeedAvg_Mph = ifelse(TravelTime_Hr > 0,
TravelDistance_Mi / TravelTime_Hr,
NA
),
Start_ID = lag(StopID_Clean),
Start_Desc = lag(Stop_Desc),
StartStop_ID = ifelse(is.na(Start_ID),
paste("NULL", StopID_Clean, sep = "--"),
paste(Start_ID, StopID_Clean, sep = "--")
)
) %>%
as.data.frame()
rm(AllDays_AddVars)
str(AllDays_BusDay)
'data.frame': 2809529 obs. of 46 variables:
$ group : Factor w/ 5 levels "1","2","3","4",..: 1 1 1 1 1 1 1 1 1 1 ...
$ Bus_ID : int 11 11 11 11 11 11 11 11 11 11 ...
$ Route : chr "S80" "S80" "S80" "S80" ...
$ RouteAlt : Factor w/ 14 levels "1","10","11",..: 1 1 1 1 1 1 6 6 6 6 ...
$ Route_Direction : Factor w/ 12 levels "","ANTICLKW",..: 6 6 6 6 6 6 6 6 6 6 ...
$ Stop_Sequence : int 7 6 3 2 8 1 2 3 4 2 ...
$ Stop_ID : chr "5004572" "5004573" "5002210" "5002209" ...
$ Stop_Desc : chr "BEULAH ST + CHARLES ARRINGTON DR" "WALKER LN + #6363" "WALKER LN + BEULAH ST" "BEULAH ST + CHARLES ARRINGTON DR" ...
$ Event_Type : int 4 4 4 4 3 3 4 4 4 4 ...
$ Event_Description : Factor w/ 3 levels "Serviced Stop ",..: 3 3 3 3 1 1 3 3 3 3 ...
$ Event_Time : POSIXct, format: "2016-10-03 06:06:47" "2016-10-03 06:09:47" ...
$ Departure_Time : POSIXct, format: "2016-10-03 06:06:47" "2016-10-03 06:09:47" ...
$ Dwell_Time : int 0 0 0 0 0 104 0 0 0 0 ...
$ Delta_Time : int -177 24 165 25 73 719 74 76 63 69 ...
$ Odometer_Distance : int 43543 45139 46418 50115 51074 51303 55633 56163 56285 57262 ...
$ Latitude : num 38.8 38.8 38.8 38.8 38.8 ...
$ Longitude : num -77.2 -77.2 -77.2 -77.2 -77.2 ...
$ Heading : int 199 97 276 15 119 100 274 104 241 274 ...
$ RowNum_OG : int 1 3 4 5 6 7 9 10 11 12 ...
$ StopID_New : int NA NA NA NA NA NA NA NA NA NA ...
$ StopID_Clean : chr "5004572" "5004573" "5002210" "5002209" ...
$ StopID_Indicator : Factor w/ 2 levels "ID_Bad","ID_OK": 2 2 2 2 2 2 2 2 2 2 ...
$ Odometer_Distance_Mi : num 8.25 8.55 8.79 9.49 9.67 ...
$ Dwell_Time2 : num 0 0 0 0 0 104 0 0 0 0 ...
$ Event_Time_Yr : int 2016 2016 2016 2016 2016 2016 2016 2016 2016 2016 ...
$ Event_Time_Mth : int 10 10 10 10 10 10 10 10 10 10 ...
$ Event_Time_Date : int 3 3 3 3 3 3 3 3 3 3 ...
$ Event_Time_Day : Ord.factor w/ 7 levels "Sun"<"Mon"<"Tues"<..: 2 2 2 2 2 2 2 2 2 2 ...
$ Event_Time_Hr : int 6 6 6 6 6 6 6 6 6 6 ...
$ Event_Time_Min : int 6 9 10 10 13 14 21 21 23 23 ...
$ Event_Time_HrGroup : Ord.factor w/ 8 levels "Group0_2"<"Group3_5"<..: 3 3 3 3 3 3 3 3 3 3 ...
$ BusDay_EventNum : int 1 2 3 4 5 6 7 8 9 10 ...
$ Route_Lag1 : chr NA "S80" "S80" "S80" ...
$ RouteAlt_Lag1 : Factor w/ 14 levels "1","10","11",..: NA 1 1 1 1 1 1 6 6 6 ...
$ Odometer_Distance_Lag1: int NA 43543 45139 46418 50115 51074 51303 55633 56163 56285 ...
$ Latitude_L1 : num NA 38.8 38.8 38.8 38.8 ...
$ Longitude_L1 : num NA -77.2 -77.2 -77.2 -77.2 ...
$ TravelDistance_Ft : int NA 1596 1279 3697 959 229 4330 530 122 977 ...
$ TravelDistance_Mi : num NA 0.302 0.242 0.7 0.182 ...
$ TravelDistance_Mi_Hvrs: num NA 0.15 0.105 0.165 0.832 ...
$ TravelTime_Sec : num NA 180 37 25 190 29 288 52 76 8 ...
$ TravelTime_Hr : num NA 0.05 0.01028 0.00694 0.05278 ...
$ SpeedAvg_Mph : num NA 6.05 23.57 100.83 3.44 ...
$ Start_ID : chr NA "5004572" "5004573" "5002210" ...
$ Start_Desc : chr NA "BEULAH ST + CHARLES ARRINGTON DR" "WALKER LN + #6363" "WALKER LN + BEULAH ST" ...
$ StartStop_ID : chr "NULL--5004572" "5004572--5004573" "5004573--5002210" "5002210--5002209" ...
# summary(AllDays_BusDay)
# View(tail(AllDays_BusDay, 50))
Inspecting for issues with StartStop_ID (where the value is either NA or contains NULL). They ONLY exist when BusDay_EventNum = 1 (which is by design). So everything looks OK.
View(group_by(AllDays_BusDay,
StartStop_ID
) %>%
summarise(
Cnt = n()
) %>%
arrange(desc(Cnt)
)
)
View(filter(AllDays_BusDay,
(is.na(StartStop_ID) |
str_detect(StartStop_ID, "NULL")
) &
BusDay_EventNum != 1
)
)
Stats (quantiles) overall for TravelDistance_Mi.
Quantiles_dt <- AllDays_BusDay %>%
mutate(TD_Mi_q2 = quantile(x = TravelDistance_Mi, probs = 0.02, na.rm = TRUE),
TD_Mi_q98 = quantile(x = TravelDistance_Mi, probs = 0.98, na.rm = TRUE),
TT_Sec_q2 = quantile(x = TravelTime_Sec, probs = 0.02, na.rm = TRUE),
TT_Sec_q98 = quantile(x = TravelTime_Sec, probs = 0.98, na.rm = TRUE),
TT_Hr_q2 = quantile(x = TravelTime_Hr, probs = 0.02, na.rm = TRUE),
TT_Hr_q98 = quantile(x = TravelTime_Hr, probs = 0.98, na.rm = TRUE)
) %>%
data.table()
Stats <- Quantiles_dt %>%
mutate(TD_Mi_Mean = mean(TravelDistance_Mi, na.rm = TRUE),
TD_Mi_Mean_F = mean(TravelDistance_Mi[TD_Mi_q2 <= TravelDistance_Mi & TravelDistance_Mi <= TD_Mi_q98],
na.rm = TRUE
),
TD_Mi_Med = median(TravelDistance_Mi, na.rm = TRUE),
TD_Mi_Med_F = median(TravelDistance_Mi[TD_Mi_q2 <= TravelDistance_Mi & TravelDistance_Mi <= TD_Mi_q98],
na.rm = TRUE
),
TD_Mi_Cnt = sum(!is.na(TravelDistance_Mi)
),
TD_Mi_Cnt_F = sum(!is.na(TravelDistance_Mi[TD_Mi_q2 <= TravelDistance_Mi & TravelDistance_Mi <= TD_Mi_q98]
)
),
TT_Sec_Mean = mean(TravelTime_Sec, na.rm = TRUE),
TT_Sec_Mean_F = mean(TravelTime_Sec[TT_Sec_q2 <= TravelTime_Sec & TravelTime_Sec <= TT_Sec_q98],
na.rm = TRUE
),
TT_Sec_Med = median(TravelTime_Sec, na.rm = TRUE),
TT_Sec_Med_F = median(TravelTime_Sec[TT_Sec_q2 <= TravelTime_Sec & TravelTime_Sec <= TT_Sec_q98],
na.rm = TRUE
),
TT_Sec_Cnt = sum(!is.na(TravelTime_Sec)
),
TT_Sec_Cnt_F = sum(!is.na(TravelTime_Sec[TT_Sec_q2 <= TravelTime_Sec & TravelTime_Sec <= TT_Sec_q98]
)
),
TT_Hr_Mean = mean(TravelTime_Hr, na.rm = TRUE),
TT_Hr_Mean_F = mean(TravelTime_Hr[TT_Hr_q2 <= TravelTime_Hr & TravelTime_Hr <= TT_Hr_q98],
na.rm = TRUE
),
TT_Hr_Med = median(TravelTime_Hr, na.rm = TRUE),
TT_Hr_Med_F = median(TravelTime_Hr[TT_Hr_q2 <= TravelTime_Hr & TravelTime_Hr <= TT_Hr_q98],
na.rm = TRUE
),
TT_Hr_Cnt = sum(!is.na(TravelTime_Hr)
),
TT_Hr_Cnt_F = sum(!is.na(TravelTime_Hr[TT_Hr_q2 <= TravelTime_Hr & TravelTime_Hr <= TT_Hr_q98]
)
)
) %>%
data.frame()
rm(AllDays_BusDay)
rm(Quantiles_dt)
str(Stats)
'data.frame': 2809529 obs. of 70 variables:
$ group : Factor w/ 5 levels "1","2","3","4",..: 1 1 1 1 1 1 1 1 1 1 ...
$ Bus_ID : int 11 11 11 11 11 11 11 11 11 11 ...
$ Route : chr "S80" "S80" "S80" "S80" ...
$ RouteAlt : Factor w/ 14 levels "1","10","11",..: 1 1 1 1 1 1 6 6 6 6 ...
$ Route_Direction : Factor w/ 12 levels "","ANTICLKW",..: 6 6 6 6 6 6 6 6 6 6 ...
$ Stop_Sequence : int 7 6 3 2 8 1 2 3 4 2 ...
$ Stop_ID : chr "5004572" "5004573" "5002210" "5002209" ...
$ Stop_Desc : chr "BEULAH ST + CHARLES ARRINGTON DR" "WALKER LN + #6363" "WALKER LN + BEULAH ST" "BEULAH ST + CHARLES ARRINGTON DR" ...
$ Event_Type : int 4 4 4 4 3 3 4 4 4 4 ...
$ Event_Description : Factor w/ 3 levels "Serviced Stop ",..: 3 3 3 3 1 1 3 3 3 3 ...
$ Event_Time : POSIXct, format: "2016-10-03 06:06:47" "2016-10-03 06:09:47" ...
$ Departure_Time : POSIXct, format: "2016-10-03 06:06:47" "2016-10-03 06:09:47" ...
$ Dwell_Time : int 0 0 0 0 0 104 0 0 0 0 ...
$ Delta_Time : int -177 24 165 25 73 719 74 76 63 69 ...
$ Odometer_Distance : int 43543 45139 46418 50115 51074 51303 55633 56163 56285 57262 ...
$ Latitude : num 38.8 38.8 38.8 38.8 38.8 ...
$ Longitude : num -77.2 -77.2 -77.2 -77.2 -77.2 ...
$ Heading : int 199 97 276 15 119 100 274 104 241 274 ...
$ RowNum_OG : int 1 3 4 5 6 7 9 10 11 12 ...
$ StopID_New : int NA NA NA NA NA NA NA NA NA NA ...
$ StopID_Clean : chr "5004572" "5004573" "5002210" "5002209" ...
$ StopID_Indicator : Factor w/ 2 levels "ID_Bad","ID_OK": 2 2 2 2 2 2 2 2 2 2 ...
$ Odometer_Distance_Mi : num 8.25 8.55 8.79 9.49 9.67 ...
$ Dwell_Time2 : num 0 0 0 0 0 104 0 0 0 0 ...
$ Event_Time_Yr : int 2016 2016 2016 2016 2016 2016 2016 2016 2016 2016 ...
$ Event_Time_Mth : int 10 10 10 10 10 10 10 10 10 10 ...
$ Event_Time_Date : int 3 3 3 3 3 3 3 3 3 3 ...
$ Event_Time_Day : Ord.factor w/ 7 levels "Sun"<"Mon"<"Tues"<..: 2 2 2 2 2 2 2 2 2 2 ...
$ Event_Time_Hr : int 6 6 6 6 6 6 6 6 6 6 ...
$ Event_Time_Min : int 6 9 10 10 13 14 21 21 23 23 ...
$ Event_Time_HrGroup : Ord.factor w/ 8 levels "Group0_2"<"Group3_5"<..: 3 3 3 3 3 3 3 3 3 3 ...
$ BusDay_EventNum : int 1 2 3 4 5 6 7 8 9 10 ...
$ Route_Lag1 : chr NA "S80" "S80" "S80" ...
$ RouteAlt_Lag1 : Factor w/ 14 levels "1","10","11",..: NA 1 1 1 1 1 1 6 6 6 ...
$ Odometer_Distance_Lag1: int NA 43543 45139 46418 50115 51074 51303 55633 56163 56285 ...
$ Latitude_L1 : num NA 38.8 38.8 38.8 38.8 ...
$ Longitude_L1 : num NA -77.2 -77.2 -77.2 -77.2 ...
$ TravelDistance_Ft : int NA 1596 1279 3697 959 229 4330 530 122 977 ...
$ TravelDistance_Mi : num NA 0.302 0.242 0.7 0.182 ...
$ TravelDistance_Mi_Hvrs: num NA 0.15 0.105 0.165 0.832 ...
$ TravelTime_Sec : num NA 180 37 25 190 29 288 52 76 8 ...
$ TravelTime_Hr : num NA 0.05 0.01028 0.00694 0.05278 ...
$ SpeedAvg_Mph : num NA 6.05 23.57 100.83 3.44 ...
$ Start_ID : chr NA "5004572" "5004573" "5002210" ...
$ Start_Desc : chr NA "BEULAH ST + CHARLES ARRINGTON DR" "WALKER LN + #6363" "WALKER LN + BEULAH ST" ...
$ StartStop_ID : chr "NULL--5004572" "5004572--5004573" "5004573--5002210" "5002210--5002209" ...
$ TD_Mi_q2 : num 0.0521 0.0521 0.0521 0.0521 0.0521 ...
$ TD_Mi_q98 : num 0.959 0.959 0.959 0.959 0.959 ...
$ TT_Sec_q2 : num 10 10 10 10 10 10 10 10 10 10 ...
$ TT_Sec_q98 : num 349 349 349 349 349 349 349 349 349 349 ...
$ TT_Hr_q2 : num 0.00278 0.00278 0.00278 0.00278 0.00278 ...
$ TT_Hr_q98 : num 0.0969 0.0969 0.0969 0.0969 0.0969 ...
$ TD_Mi_Mean : num 0.308 0.308 0.308 0.308 0.308 ...
$ TD_Mi_Mean_F : num 0.232 0.232 0.232 0.232 0.232 ...
$ TD_Mi_Med : num 0.198 0.198 0.198 0.198 0.198 ...
$ TD_Mi_Med_F : num 0.198 0.198 0.198 0.198 0.198 ...
$ TD_Mi_Cnt : int 2486795 2486795 2486795 2486795 2486795 2486795 2486795 2486795 2486795 2486795 ...
$ TD_Mi_Cnt_F : int 2387406 2387406 2387406 2387406 2387406 2387406 2387406 2387406 2387406 2387406 ...
$ TT_Sec_Mean : num 105 105 105 105 105 ...
$ TT_Sec_Mean_F : num 56.6 56.6 56.6 56.6 56.6 ...
$ TT_Sec_Med : num 39 39 39 39 39 39 39 39 39 39 ...
$ TT_Sec_Med_F : num 39 39 39 39 39 39 39 39 39 39 ...
$ TT_Sec_Cnt : int 2802888 2802888 2802888 2802888 2802888 2802888 2802888 2802888 2802888 2802888 ...
$ TT_Sec_Cnt_F : int 2705189 2705189 2705189 2705189 2705189 2705189 2705189 2705189 2705189 2705189 ...
$ TT_Hr_Mean : num 0.0291 0.0291 0.0291 0.0291 0.0291 ...
$ TT_Hr_Mean_F : num 0.0157 0.0157 0.0157 0.0157 0.0157 ...
$ TT_Hr_Med : num 0.0108 0.0108 0.0108 0.0108 0.0108 ...
$ TT_Hr_Med_F : num 0.0108 0.0108 0.0108 0.0108 0.0108 ...
$ TT_Hr_Cnt : int 2802888 2802888 2802888 2802888 2802888 2802888 2802888 2802888 2802888 2802888 ...
$ TT_Hr_Cnt_F : int 2705189 2705189 2705189 2705189 2705189 2705189 2705189 2705189 2705189 2705189 ...
# View(head(Stats, 50))
Stats for StartStop_ID.
Quantiles_SS_dt <- group_by(Stats,
StartStop_ID
) %>%
mutate(TD_Mi_SS_q5 = quantile(x = TravelDistance_Mi, probs = 0.05, na.rm = TRUE),
TD_Mi_SS_q95 = quantile(x = TravelDistance_Mi, probs = 0.95, na.rm = TRUE),
TT_Sec_SS_q5 = quantile(x = TravelTime_Sec, probs = 0.05, na.rm = TRUE),
TT_Sec_SS_q95 = quantile(x = TravelTime_Sec, probs = 0.95, na.rm = TRUE),
TT_Hr_SS_q5 = quantile(x = TravelTime_Hr, probs = 0.05, na.rm = TRUE),
TT_Hr_SS_q95 = quantile(x = TravelTime_Hr, probs = 0.95, na.rm = TRUE)
) %>%
data.table()
Stats_StSt <- group_by(Quantiles_SS_dt,
StartStop_ID
) %>%
mutate(TD_Mi_SS_Mean = mean(TravelDistance_Mi, na.rm = TRUE),
TD_Mi_SS_Mean_F = mean(TravelDistance_Mi[TD_Mi_SS_q5 <= TravelDistance_Mi & TravelDistance_Mi <= TD_Mi_SS_q95],
na.rm = TRUE
),
TD_Mi_SS_Med = median(TravelDistance_Mi, na.rm = TRUE),
TD_Mi_SS_Med_F = median(TravelDistance_Mi[TD_Mi_SS_q5 <= TravelDistance_Mi & TravelDistance_Mi <= TD_Mi_SS_q95],
na.rm = TRUE
),
TD_Mi_SS_Cnt = sum(!is.na(TravelDistance_Mi)
),
TD_Mi_SS_Cnt_F = sum(!is.na(TravelDistance_Mi[TD_Mi_SS_q5 <= TravelDistance_Mi & TravelDistance_Mi <= TD_Mi_SS_q95]
)
),
TT_Sec_SS_Mean = mean(TravelTime_Sec, na.rm = TRUE),
TT_Sec_SS_Mean_F = mean(TravelTime_Sec[TT_Sec_SS_q5 <= TravelTime_Sec & TravelTime_Sec <= TT_Sec_SS_q95],
na.rm = TRUE
),
TT_Sec_SS_Med = median(TravelTime_Sec, na.rm = TRUE),
TT_Sec_SS_Med_F = median(TravelTime_Sec[TT_Sec_SS_q5 <= TravelTime_Sec & TravelTime_Sec <= TT_Sec_SS_q95],
na.rm = TRUE
),
TT_Sec_SS_Cnt = sum(!is.na(TravelTime_Sec)),
TT_Sec_SS_Cnt_F = sum(!is.na(TravelTime_Sec[TT_Sec_SS_q5 <= TravelTime_Sec & TravelTime_Sec <= TT_Sec_SS_q95]
)
),
TT_Hr_SS_Mean = mean(TravelTime_Hr, na.rm = TRUE),
TT_Hr_SS_Mean_F = mean(TravelTime_Hr[TT_Hr_SS_q5 <= TravelTime_Hr & TravelTime_Hr <= TT_Hr_SS_q95],
na.rm = TRUE
),
TT_Hr_SS_Med = median(TravelTime_Hr, na.rm = TRUE),
TT_Hr_SS_Med_F = median(TravelTime_Hr[TT_Hr_SS_q5 <= TravelTime_Hr & TravelTime_Hr <= TT_Hr_SS_q95],
na.rm = TRUE
),
TT_Hr_SS_Cnt = sum(!is.na(TravelTime_Hr)),
TT_Hr_SS_Cnt_F = sum(!is.na(TravelTime_Hr[TT_Hr_SS_q5 <= TravelTime_Hr & TravelTime_Hr <= TT_Hr_SS_q95]
)
)
) %>%
data.frame()
rm(Stats)
rm(Quantiles_SS_dt)
str(Stats_StSt)
'data.frame': 2809529 obs. of 94 variables:
$ group : Factor w/ 5 levels "1","2","3","4",..: 1 1 1 1 1 1 1 1 1 1 ...
$ Bus_ID : int 11 11 11 11 11 11 11 11 11 11 ...
$ Route : chr "S80" "S80" "S80" "S80" ...
$ RouteAlt : Factor w/ 14 levels "1","10","11",..: 1 1 1 1 1 1 6 6 6 6 ...
$ Route_Direction : Factor w/ 12 levels "","ANTICLKW",..: 6 6 6 6 6 6 6 6 6 6 ...
$ Stop_Sequence : int 7 6 3 2 8 1 2 3 4 2 ...
$ Stop_ID : chr "5004572" "5004573" "5002210" "5002209" ...
$ Stop_Desc : chr "BEULAH ST + CHARLES ARRINGTON DR" "WALKER LN + #6363" "WALKER LN + BEULAH ST" "BEULAH ST + CHARLES ARRINGTON DR" ...
$ Event_Type : int 4 4 4 4 3 3 4 4 4 4 ...
$ Event_Description : Factor w/ 3 levels "Serviced Stop ",..: 3 3 3 3 1 1 3 3 3 3 ...
$ Event_Time : POSIXct, format: "2016-10-03 06:06:47" "2016-10-03 06:09:47" ...
$ Departure_Time : POSIXct, format: "2016-10-03 06:06:47" "2016-10-03 06:09:47" ...
$ Dwell_Time : int 0 0 0 0 0 104 0 0 0 0 ...
$ Delta_Time : int -177 24 165 25 73 719 74 76 63 69 ...
$ Odometer_Distance : int 43543 45139 46418 50115 51074 51303 55633 56163 56285 57262 ...
$ Latitude : num 38.8 38.8 38.8 38.8 38.8 ...
$ Longitude : num -77.2 -77.2 -77.2 -77.2 -77.2 ...
$ Heading : int 199 97 276 15 119 100 274 104 241 274 ...
$ RowNum_OG : int 1 3 4 5 6 7 9 10 11 12 ...
$ StopID_New : int NA NA NA NA NA NA NA NA NA NA ...
$ StopID_Clean : chr "5004572" "5004573" "5002210" "5002209" ...
$ StopID_Indicator : Factor w/ 2 levels "ID_Bad","ID_OK": 2 2 2 2 2 2 2 2 2 2 ...
$ Odometer_Distance_Mi : num 8.25 8.55 8.79 9.49 9.67 ...
$ Dwell_Time2 : num 0 0 0 0 0 104 0 0 0 0 ...
$ Event_Time_Yr : int 2016 2016 2016 2016 2016 2016 2016 2016 2016 2016 ...
$ Event_Time_Mth : int 10 10 10 10 10 10 10 10 10 10 ...
$ Event_Time_Date : int 3 3 3 3 3 3 3 3 3 3 ...
$ Event_Time_Day : Ord.factor w/ 7 levels "Sun"<"Mon"<"Tues"<..: 2 2 2 2 2 2 2 2 2 2 ...
$ Event_Time_Hr : int 6 6 6 6 6 6 6 6 6 6 ...
$ Event_Time_Min : int 6 9 10 10 13 14 21 21 23 23 ...
$ Event_Time_HrGroup : Ord.factor w/ 8 levels "Group0_2"<"Group3_5"<..: 3 3 3 3 3 3 3 3 3 3 ...
$ BusDay_EventNum : int 1 2 3 4 5 6 7 8 9 10 ...
$ Route_Lag1 : chr NA "S80" "S80" "S80" ...
$ RouteAlt_Lag1 : Factor w/ 14 levels "1","10","11",..: NA 1 1 1 1 1 1 6 6 6 ...
$ Odometer_Distance_Lag1: int NA 43543 45139 46418 50115 51074 51303 55633 56163 56285 ...
$ Latitude_L1 : num NA 38.8 38.8 38.8 38.8 ...
$ Longitude_L1 : num NA -77.2 -77.2 -77.2 -77.2 ...
$ TravelDistance_Ft : int NA 1596 1279 3697 959 229 4330 530 122 977 ...
$ TravelDistance_Mi : num NA 0.302 0.242 0.7 0.182 ...
$ TravelDistance_Mi_Hvrs: num NA 0.15 0.105 0.165 0.832 ...
$ TravelTime_Sec : num NA 180 37 25 190 29 288 52 76 8 ...
$ TravelTime_Hr : num NA 0.05 0.01028 0.00694 0.05278 ...
$ SpeedAvg_Mph : num NA 6.05 23.57 100.83 3.44 ...
$ Start_ID : chr NA "5004572" "5004573" "5002210" ...
$ Start_Desc : chr NA "BEULAH ST + CHARLES ARRINGTON DR" "WALKER LN + #6363" "WALKER LN + BEULAH ST" ...
$ StartStop_ID : chr "NULL--5004572" "5004572--5004573" "5004573--5002210" "5002210--5002209" ...
$ TD_Mi_q2 : num 0.0521 0.0521 0.0521 0.0521 0.0521 ...
$ TD_Mi_q98 : num 0.959 0.959 0.959 0.959 0.959 ...
$ TT_Sec_q2 : num 10 10 10 10 10 10 10 10 10 10 ...
$ TT_Sec_q98 : num 349 349 349 349 349 349 349 349 349 349 ...
$ TT_Hr_q2 : num 0.00278 0.00278 0.00278 0.00278 0.00278 ...
$ TT_Hr_q98 : num 0.0969 0.0969 0.0969 0.0969 0.0969 ...
$ TD_Mi_Mean : num 0.308 0.308 0.308 0.308 0.308 ...
$ TD_Mi_Mean_F : num 0.232 0.232 0.232 0.232 0.232 ...
$ TD_Mi_Med : num 0.198 0.198 0.198 0.198 0.198 ...
$ TD_Mi_Med_F : num 0.198 0.198 0.198 0.198 0.198 ...
$ TD_Mi_Cnt : int 2486795 2486795 2486795 2486795 2486795 2486795 2486795 2486795 2486795 2486795 ...
$ TD_Mi_Cnt_F : int 2387406 2387406 2387406 2387406 2387406 2387406 2387406 2387406 2387406 2387406 ...
$ TT_Sec_Mean : num 105 105 105 105 105 ...
$ TT_Sec_Mean_F : num 56.6 56.6 56.6 56.6 56.6 ...
$ TT_Sec_Med : num 39 39 39 39 39 39 39 39 39 39 ...
$ TT_Sec_Med_F : num 39 39 39 39 39 39 39 39 39 39 ...
$ TT_Sec_Cnt : int 2802888 2802888 2802888 2802888 2802888 2802888 2802888 2802888 2802888 2802888 ...
$ TT_Sec_Cnt_F : int 2705189 2705189 2705189 2705189 2705189 2705189 2705189 2705189 2705189 2705189 ...
$ TT_Hr_Mean : num 0.0291 0.0291 0.0291 0.0291 0.0291 ...
$ TT_Hr_Mean_F : num 0.0157 0.0157 0.0157 0.0157 0.0157 ...
$ TT_Hr_Med : num 0.0108 0.0108 0.0108 0.0108 0.0108 ...
$ TT_Hr_Med_F : num 0.0108 0.0108 0.0108 0.0108 0.0108 ...
$ TT_Hr_Cnt : int 2802888 2802888 2802888 2802888 2802888 2802888 2802888 2802888 2802888 2802888 ...
$ TT_Hr_Cnt_F : int 2705189 2705189 2705189 2705189 2705189 2705189 2705189 2705189 2705189 2705189 ...
$ TD_Mi_SS_q5 : num NA 0.0252 0.2422 0.7324 0.0794 ...
$ TD_Mi_SS_q95 : num NA 0.626 0.242 1.008 0.176 ...
$ TT_Sec_SS_q5 : num NA 11.9 37 30.5 172.9 ...
$ TT_Sec_SS_q95 : num NA 346.3 37 75.8 189.1 ...
$ TT_Hr_SS_q5 : num NA 0.00331 0.01028 0.00849 0.04803 ...
$ TT_Hr_SS_q95 : num NA 0.0962 0.0103 0.0211 0.0525 ...
$ TD_Mi_SS_Mean : num NaN 0.437 0.242 0.908 0.128 ...
$ TD_Mi_SS_Mean_F : num NaN 0.457 0.242 0.977 NaN ...
$ TD_Mi_SS_Med : num NA 0.512 0.242 0.962 0.128 ...
$ TD_Mi_SS_Med_F : num NA 0.512 0.242 1.008 NA ...
$ TD_Mi_SS_Cnt : int 0 14 1 4 2 87 22 118 91 11 ...
$ TD_Mi_SS_Cnt_F : int 0 12 1 3 0 77 18 106 81 9 ...
$ TT_Sec_SS_Mean : num NaN 215.8 37 58.2 181 ...
$ TT_Sec_SS_Mean_F : num NaN 218.9 37 65.5 NaN ...
$ TT_Sec_SS_Med : num NA 223.5 37 65.5 181 ...
$ TT_Sec_SS_Med_F : num NA 223.5 37 65.5 NA ...
$ TT_Sec_SS_Cnt : int 0 14 1 4 2 173 22 141 141 11 ...
$ TT_Sec_SS_Cnt_F : int 0 12 1 2 0 156 18 127 128 9 ...
$ TT_Hr_SS_Mean : num NaN 0.0599 0.0103 0.0162 0.0503 ...
$ TT_Hr_SS_Mean_F : num NaN 0.0608 0.0103 0.0182 NaN ...
$ TT_Hr_SS_Med : num NA 0.0621 0.0103 0.0182 0.0503 ...
$ TT_Hr_SS_Med_F : num NA 0.0621 0.0103 0.0182 NA ...
$ TT_Hr_SS_Cnt : int 0 14 1 4 2 173 22 141 141 11 ...
$ TT_Hr_SS_Cnt_F : int 0 12 1 2 0 156 18 127 128 9 ...
# View(head(Stats_StSt, 50))
Stats for StartStop_ID with Event_Time_HrGroup.
Quantiles_SSHG_dt <- group_by(Stats_StSt,
StartStop_ID,
Event_Time_HrGroup
) %>%
mutate(TD_Mi_SSHG_q5 = quantile(x = TravelDistance_Mi, probs = 0.05, na.rm = TRUE),
TD_Mi_SSHG_q95 = quantile(x = TravelDistance_Mi, probs = 0.95, na.rm = TRUE),
TT_Sec_SSHG_q5 = quantile(x = TravelTime_Sec, probs = 0.05, na.rm = TRUE),
TT_Sec_SSHG_q95 = quantile(x = TravelTime_Sec, probs = 0.95, na.rm = TRUE),
TT_Hr_SSHG_q5 = quantile(x = TravelTime_Hr, probs = 0.05, na.rm = TRUE),
TT_Hr_SSHG_q95 = quantile(x = TravelTime_Hr, probs = 0.95, na.rm = TRUE)
) %>%
data.table()
Stats_StSt_HrGrp <- group_by(Quantiles_SSHG_dt,
StartStop_ID,
Event_Time_HrGroup
) %>%
mutate(TD_Mi_SSHG_Mean = mean(TravelDistance_Mi, na.rm = TRUE),
TD_Mi_SSHG_Mean_F = mean(TravelDistance_Mi[TD_Mi_SSHG_q5 <= TravelDistance_Mi & TravelDistance_Mi <= TD_Mi_SSHG_q95],
na.rm = TRUE
),
TD_Mi_SSHG_Med = median(TravelDistance_Mi, na.rm = TRUE),
TD_Mi_SSHG_Med_F = median(TravelDistance_Mi[TD_Mi_SSHG_q5 <= TravelDistance_Mi & TravelDistance_Mi <= TD_Mi_SSHG_q95],
na.rm = TRUE
),
TD_Mi_SSHG_Cnt = sum(!is.na(TravelDistance_Mi)
),
TD_Mi_SSHG_Cnt_F = sum(!is.na(TravelDistance_Mi[TD_Mi_SSHG_q5 <= TravelDistance_Mi & TravelDistance_Mi <= TD_Mi_SSHG_q95]
)
),
TT_Sec_SSHG_Mean = mean(TravelTime_Sec, na.rm = TRUE),
TT_Sec_SSHG_Mean_F = mean(TravelTime_Sec[TT_Sec_SSHG_q5 <= TravelTime_Sec & TravelTime_Sec <= TT_Sec_SSHG_q95],
na.rm = TRUE
),
TT_Sec_SSHG_Med = median(TravelTime_Sec, na.rm = TRUE),
TT_Sec_SSHG_Med_F = median(TravelTime_Sec[TT_Sec_SSHG_q5 <= TravelTime_Sec & TravelTime_Sec <= TT_Sec_SSHG_q95],
na.rm = TRUE
),
TT_Sec_SSHG_Cnt = sum(!is.na(TravelTime_Sec)),
TT_Sec_SSHG_Cnt_F = sum(!is.na(TravelTime_Sec[TT_Sec_SSHG_q5 <= TravelTime_Sec & TravelTime_Sec <= TT_Sec_SSHG_q95]
)
),
TT_Hr_SSHG_Mean = mean(TravelTime_Hr, na.rm = TRUE),
TT_Hr_SSHG_Mean_F = mean(TravelTime_Hr[TT_Hr_SSHG_q5 <= TravelTime_Hr & TravelTime_Hr <= TT_Hr_SSHG_q95],
na.rm = TRUE
),
TT_Hr_SSHG_Med = median(TravelTime_Hr, na.rm = TRUE),
TT_Hr_SSHG_Med_F = median(TravelTime_Hr[TT_Hr_SSHG_q5 <= TravelTime_Hr & TravelTime_Hr <= TT_Hr_SSHG_q95],
na.rm = TRUE
),
TT_Hr_SSHG_Cnt = sum(!is.na(TravelTime_Hr)),
TT_Hr_SSHG_Cnt_F = sum(!is.na(TravelTime_Hr[TT_Hr_SSHG_q5 <= TravelTime_Hr & TravelTime_Hr <= TT_Hr_SSHG_q95]
)
)
) %>%
data.frame()
rm(Stats_StSt)
rm(Quantiles_SSHG_dt)
str(Stats_StSt_HrGrp)
'data.frame': 2809529 obs. of 118 variables:
$ group : Factor w/ 5 levels "1","2","3","4",..: 1 1 1 1 1 1 1 1 1 1 ...
$ Bus_ID : int 11 11 11 11 11 11 11 11 11 11 ...
$ Route : chr "S80" "S80" "S80" "S80" ...
$ RouteAlt : Factor w/ 14 levels "1","10","11",..: 1 1 1 1 1 1 6 6 6 6 ...
$ Route_Direction : Factor w/ 12 levels "","ANTICLKW",..: 6 6 6 6 6 6 6 6 6 6 ...
$ Stop_Sequence : int 7 6 3 2 8 1 2 3 4 2 ...
$ Stop_ID : chr "5004572" "5004573" "5002210" "5002209" ...
$ Stop_Desc : chr "BEULAH ST + CHARLES ARRINGTON DR" "WALKER LN + #6363" "WALKER LN + BEULAH ST" "BEULAH ST + CHARLES ARRINGTON DR" ...
$ Event_Type : int 4 4 4 4 3 3 4 4 4 4 ...
$ Event_Description : Factor w/ 3 levels "Serviced Stop ",..: 3 3 3 3 1 1 3 3 3 3 ...
$ Event_Time : POSIXct, format: "2016-10-03 06:06:47" "2016-10-03 06:09:47" ...
$ Departure_Time : POSIXct, format: "2016-10-03 06:06:47" "2016-10-03 06:09:47" ...
$ Dwell_Time : int 0 0 0 0 0 104 0 0 0 0 ...
$ Delta_Time : int -177 24 165 25 73 719 74 76 63 69 ...
$ Odometer_Distance : int 43543 45139 46418 50115 51074 51303 55633 56163 56285 57262 ...
$ Latitude : num 38.8 38.8 38.8 38.8 38.8 ...
$ Longitude : num -77.2 -77.2 -77.2 -77.2 -77.2 ...
$ Heading : int 199 97 276 15 119 100 274 104 241 274 ...
$ RowNum_OG : int 1 3 4 5 6 7 9 10 11 12 ...
$ StopID_New : int NA NA NA NA NA NA NA NA NA NA ...
$ StopID_Clean : chr "5004572" "5004573" "5002210" "5002209" ...
$ StopID_Indicator : Factor w/ 2 levels "ID_Bad","ID_OK": 2 2 2 2 2 2 2 2 2 2 ...
$ Odometer_Distance_Mi : num 8.25 8.55 8.79 9.49 9.67 ...
$ Dwell_Time2 : num 0 0 0 0 0 104 0 0 0 0 ...
$ Event_Time_Yr : int 2016 2016 2016 2016 2016 2016 2016 2016 2016 2016 ...
$ Event_Time_Mth : int 10 10 10 10 10 10 10 10 10 10 ...
$ Event_Time_Date : int 3 3 3 3 3 3 3 3 3 3 ...
$ Event_Time_Day : Ord.factor w/ 7 levels "Sun"<"Mon"<"Tues"<..: 2 2 2 2 2 2 2 2 2 2 ...
$ Event_Time_Hr : int 6 6 6 6 6 6 6 6 6 6 ...
$ Event_Time_Min : int 6 9 10 10 13 14 21 21 23 23 ...
$ Event_Time_HrGroup : Ord.factor w/ 8 levels "Group0_2"<"Group3_5"<..: 3 3 3 3 3 3 3 3 3 3 ...
$ BusDay_EventNum : int 1 2 3 4 5 6 7 8 9 10 ...
$ Route_Lag1 : chr NA "S80" "S80" "S80" ...
$ RouteAlt_Lag1 : Factor w/ 14 levels "1","10","11",..: NA 1 1 1 1 1 1 6 6 6 ...
$ Odometer_Distance_Lag1: int NA 43543 45139 46418 50115 51074 51303 55633 56163 56285 ...
$ Latitude_L1 : num NA 38.8 38.8 38.8 38.8 ...
$ Longitude_L1 : num NA -77.2 -77.2 -77.2 -77.2 ...
$ TravelDistance_Ft : int NA 1596 1279 3697 959 229 4330 530 122 977 ...
$ TravelDistance_Mi : num NA 0.302 0.242 0.7 0.182 ...
$ TravelDistance_Mi_Hvrs: num NA 0.15 0.105 0.165 0.832 ...
$ TravelTime_Sec : num NA 180 37 25 190 29 288 52 76 8 ...
$ TravelTime_Hr : num NA 0.05 0.01028 0.00694 0.05278 ...
$ SpeedAvg_Mph : num NA 6.05 23.57 100.83 3.44 ...
$ Start_ID : chr NA "5004572" "5004573" "5002210" ...
$ Start_Desc : chr NA "BEULAH ST + CHARLES ARRINGTON DR" "WALKER LN + #6363" "WALKER LN + BEULAH ST" ...
$ StartStop_ID : chr "NULL--5004572" "5004572--5004573" "5004573--5002210" "5002210--5002209" ...
$ TD_Mi_q2 : num 0.0521 0.0521 0.0521 0.0521 0.0521 ...
$ TD_Mi_q98 : num 0.959 0.959 0.959 0.959 0.959 ...
$ TT_Sec_q2 : num 10 10 10 10 10 10 10 10 10 10 ...
$ TT_Sec_q98 : num 349 349 349 349 349 349 349 349 349 349 ...
$ TT_Hr_q2 : num 0.00278 0.00278 0.00278 0.00278 0.00278 ...
$ TT_Hr_q98 : num 0.0969 0.0969 0.0969 0.0969 0.0969 ...
$ TD_Mi_Mean : num 0.308 0.308 0.308 0.308 0.308 ...
$ TD_Mi_Mean_F : num 0.232 0.232 0.232 0.232 0.232 ...
$ TD_Mi_Med : num 0.198 0.198 0.198 0.198 0.198 ...
$ TD_Mi_Med_F : num 0.198 0.198 0.198 0.198 0.198 ...
$ TD_Mi_Cnt : int 2486795 2486795 2486795 2486795 2486795 2486795 2486795 2486795 2486795 2486795 ...
$ TD_Mi_Cnt_F : int 2387406 2387406 2387406 2387406 2387406 2387406 2387406 2387406 2387406 2387406 ...
$ TT_Sec_Mean : num 105 105 105 105 105 ...
$ TT_Sec_Mean_F : num 56.6 56.6 56.6 56.6 56.6 ...
$ TT_Sec_Med : num 39 39 39 39 39 39 39 39 39 39 ...
$ TT_Sec_Med_F : num 39 39 39 39 39 39 39 39 39 39 ...
$ TT_Sec_Cnt : int 2802888 2802888 2802888 2802888 2802888 2802888 2802888 2802888 2802888 2802888 ...
$ TT_Sec_Cnt_F : int 2705189 2705189 2705189 2705189 2705189 2705189 2705189 2705189 2705189 2705189 ...
$ TT_Hr_Mean : num 0.0291 0.0291 0.0291 0.0291 0.0291 ...
$ TT_Hr_Mean_F : num 0.0157 0.0157 0.0157 0.0157 0.0157 ...
$ TT_Hr_Med : num 0.0108 0.0108 0.0108 0.0108 0.0108 ...
$ TT_Hr_Med_F : num 0.0108 0.0108 0.0108 0.0108 0.0108 ...
$ TT_Hr_Cnt : int 2802888 2802888 2802888 2802888 2802888 2802888 2802888 2802888 2802888 2802888 ...
$ TT_Hr_Cnt_F : int 2705189 2705189 2705189 2705189 2705189 2705189 2705189 2705189 2705189 2705189 ...
$ TD_Mi_SS_q5 : num NA 0.0252 0.2422 0.7324 0.0794 ...
$ TD_Mi_SS_q95 : num NA 0.626 0.242 1.008 0.176 ...
$ TT_Sec_SS_q5 : num NA 11.9 37 30.5 172.9 ...
$ TT_Sec_SS_q95 : num NA 346.3 37 75.8 189.1 ...
$ TT_Hr_SS_q5 : num NA 0.00331 0.01028 0.00849 0.04803 ...
$ TT_Hr_SS_q95 : num NA 0.0962 0.0103 0.0211 0.0525 ...
$ TD_Mi_SS_Mean : num NaN 0.437 0.242 0.908 0.128 ...
$ TD_Mi_SS_Mean_F : num NaN 0.457 0.242 0.977 NaN ...
$ TD_Mi_SS_Med : num NA 0.512 0.242 0.962 0.128 ...
$ TD_Mi_SS_Med_F : num NA 0.512 0.242 1.008 NA ...
$ TD_Mi_SS_Cnt : int 0 14 1 4 2 87 22 118 91 11 ...
$ TD_Mi_SS_Cnt_F : int 0 12 1 3 0 77 18 106 81 9 ...
$ TT_Sec_SS_Mean : num NaN 215.8 37 58.2 181 ...
$ TT_Sec_SS_Mean_F : num NaN 218.9 37 65.5 NaN ...
$ TT_Sec_SS_Med : num NA 223.5 37 65.5 181 ...
$ TT_Sec_SS_Med_F : num NA 223.5 37 65.5 NA ...
$ TT_Sec_SS_Cnt : int 0 14 1 4 2 173 22 141 141 11 ...
$ TT_Sec_SS_Cnt_F : int 0 12 1 2 0 156 18 127 128 9 ...
$ TT_Hr_SS_Mean : num NaN 0.0599 0.0103 0.0162 0.0503 ...
$ TT_Hr_SS_Mean_F : num NaN 0.0608 0.0103 0.0182 NaN ...
$ TT_Hr_SS_Med : num NA 0.0621 0.0103 0.0182 0.0503 ...
$ TT_Hr_SS_Med_F : num NA 0.0621 0.0103 0.0182 NA ...
$ TT_Hr_SS_Cnt : int 0 14 1 4 2 173 22 141 141 11 ...
$ TT_Hr_SS_Cnt_F : int 0 12 1 2 0 156 18 127 128 9 ...
$ TD_Mi_SSHG_q5 : num NA 0.0996 0.2422 0.7002 0.1816 ...
$ TD_Mi_SSHG_q95 : num NA 0.627 0.242 0.7 0.182 ...
$ TT_Sec_SSHG_q5 : num NA 59.6 37 25 190 11.6 236 51.5 55 8.8 ...
$ TT_Sec_SSHG_q95 : num NA 276 37 25 190 ...
$ TT_Hr_SSHG_q5 : num NA 0.01656 0.01028 0.00694 0.05278 ...
[list output truncated]
# View(head(Stats_StSt_HrGrp, 50))
Feature engineering.
Creating a BusEventRoute row number, and a RouteAlt_Lag1 indicator for future identification purposes.
# rm(Quantiles_dt)
# rm(Quantiles_SS_dt)
# rm(AllDays_BusDay)
# rm(Quantiles_SSHG_dt)
# rm(Stats_StSt)
# AllDays_BusDayRoute <- group_by(Stats_StSt_HrGrp,
# Bus_ID,
# Event_Time_Date,
# Route
# ) %>%
# mutate(RouteAlt_Lag2 = lag(RouteAlt) # used in future analyses to identify RouteAlt (direction) changes
#
# # Odometer_Distance_Lag1 = lag(Odometer_Distance),
# #
# # # accounting for potential negative distances
# # TravelDistance_Ft = ifelse(Odometer_Distance >= Odometer_Distance_Lag1,
# # Odometer_Distance - Odometer_Distance_Lag1,
# # NA
# # ),
# # TravelDistance_Mi = TravelDistance_Ft / 5280, #5,280 feet in 1 mile
# #
# # # accounting for potential negative times
# # TravelTime_Sec = as.numeric(ifelse(Event_Time >= lag(Departure_Time),
# # Event_Time - lag(Departure_Time),
# # NA
# # )
# # ),
# # TravelTime_Hr = TravelTime_Sec / 3600, # 3,600 seconds in 1 hour
# #
# # # accounting for potential negative or zero travel times
# # SpeedAvg_Mph = ifelse(TravelTime_Hr > 0,
# # TravelDistance_Mi / TravelTime_Hr,
# # NA
# # )
# ) %>%
# data.frame()
#
# rm(Stats_StSt_HrGrp)
# str(AllDays_BusDayRoute)
Feature engineering.
Calculating a variable to know if the RouteAlt changed. Could be useful in helping identifying weirdness in calculated distances and speeds.
# rm(Stats_StSt_HrGrp)
AllDays_DirChange <- Stats_StSt_HrGrp %>% # AllDays_BusDayRoute %>%
mutate(RteChange = ifelse(Route == Route_Lag1,
"Same",
"Change"
),
RteChange2 = factor(ifelse(is.na(RteChange),
"Change",
RteChange
)
),
DirChange = ifelse(RouteAlt == RouteAlt_Lag1,
"Same",
"Change"
),
DirChange2 = factor(ifelse(is.na(DirChange),
"Change",
DirChange
)
)
)
# rm(AllDays_BusDayRoute)
rm(Stats_StSt_HrGrp)
str(AllDays_DirChange)
'data.frame': 2809529 obs. of 122 variables:
$ group : Factor w/ 5 levels "1","2","3","4",..: 1 1 1 1 1 1 1 1 1 1 ...
$ Bus_ID : int 11 11 11 11 11 11 11 11 11 11 ...
$ Route : chr "S80" "S80" "S80" "S80" ...
$ RouteAlt : Factor w/ 14 levels "1","10","11",..: 1 1 1 1 1 1 6 6 6 6 ...
$ Route_Direction : Factor w/ 12 levels "","ANTICLKW",..: 6 6 6 6 6 6 6 6 6 6 ...
$ Stop_Sequence : int 7 6 3 2 8 1 2 3 4 2 ...
$ Stop_ID : chr "5004572" "5004573" "5002210" "5002209" ...
$ Stop_Desc : chr "BEULAH ST + CHARLES ARRINGTON DR" "WALKER LN + #6363" "WALKER LN + BEULAH ST" "BEULAH ST + CHARLES ARRINGTON DR" ...
$ Event_Type : int 4 4 4 4 3 3 4 4 4 4 ...
$ Event_Description : Factor w/ 3 levels "Serviced Stop ",..: 3 3 3 3 1 1 3 3 3 3 ...
$ Event_Time : POSIXct, format: "2016-10-03 06:06:47" "2016-10-03 06:09:47" ...
$ Departure_Time : POSIXct, format: "2016-10-03 06:06:47" "2016-10-03 06:09:47" ...
$ Dwell_Time : int 0 0 0 0 0 104 0 0 0 0 ...
$ Delta_Time : int -177 24 165 25 73 719 74 76 63 69 ...
$ Odometer_Distance : int 43543 45139 46418 50115 51074 51303 55633 56163 56285 57262 ...
$ Latitude : num 38.8 38.8 38.8 38.8 38.8 ...
$ Longitude : num -77.2 -77.2 -77.2 -77.2 -77.2 ...
$ Heading : int 199 97 276 15 119 100 274 104 241 274 ...
$ RowNum_OG : int 1 3 4 5 6 7 9 10 11 12 ...
$ StopID_New : int NA NA NA NA NA NA NA NA NA NA ...
$ StopID_Clean : chr "5004572" "5004573" "5002210" "5002209" ...
$ StopID_Indicator : Factor w/ 2 levels "ID_Bad","ID_OK": 2 2 2 2 2 2 2 2 2 2 ...
$ Odometer_Distance_Mi : num 8.25 8.55 8.79 9.49 9.67 ...
$ Dwell_Time2 : num 0 0 0 0 0 104 0 0 0 0 ...
$ Event_Time_Yr : int 2016 2016 2016 2016 2016 2016 2016 2016 2016 2016 ...
$ Event_Time_Mth : int 10 10 10 10 10 10 10 10 10 10 ...
$ Event_Time_Date : int 3 3 3 3 3 3 3 3 3 3 ...
$ Event_Time_Day : Ord.factor w/ 7 levels "Sun"<"Mon"<"Tues"<..: 2 2 2 2 2 2 2 2 2 2 ...
$ Event_Time_Hr : int 6 6 6 6 6 6 6 6 6 6 ...
$ Event_Time_Min : int 6 9 10 10 13 14 21 21 23 23 ...
$ Event_Time_HrGroup : Ord.factor w/ 8 levels "Group0_2"<"Group3_5"<..: 3 3 3 3 3 3 3 3 3 3 ...
$ BusDay_EventNum : int 1 2 3 4 5 6 7 8 9 10 ...
$ Route_Lag1 : chr NA "S80" "S80" "S80" ...
$ RouteAlt_Lag1 : Factor w/ 14 levels "1","10","11",..: NA 1 1 1 1 1 1 6 6 6 ...
$ Odometer_Distance_Lag1: int NA 43543 45139 46418 50115 51074 51303 55633 56163 56285 ...
$ Latitude_L1 : num NA 38.8 38.8 38.8 38.8 ...
$ Longitude_L1 : num NA -77.2 -77.2 -77.2 -77.2 ...
$ TravelDistance_Ft : int NA 1596 1279 3697 959 229 4330 530 122 977 ...
$ TravelDistance_Mi : num NA 0.302 0.242 0.7 0.182 ...
$ TravelDistance_Mi_Hvrs: num NA 0.15 0.105 0.165 0.832 ...
$ TravelTime_Sec : num NA 180 37 25 190 29 288 52 76 8 ...
$ TravelTime_Hr : num NA 0.05 0.01028 0.00694 0.05278 ...
$ SpeedAvg_Mph : num NA 6.05 23.57 100.83 3.44 ...
$ Start_ID : chr NA "5004572" "5004573" "5002210" ...
$ Start_Desc : chr NA "BEULAH ST + CHARLES ARRINGTON DR" "WALKER LN + #6363" "WALKER LN + BEULAH ST" ...
$ StartStop_ID : chr "NULL--5004572" "5004572--5004573" "5004573--5002210" "5002210--5002209" ...
$ TD_Mi_q2 : num 0.0521 0.0521 0.0521 0.0521 0.0521 ...
$ TD_Mi_q98 : num 0.959 0.959 0.959 0.959 0.959 ...
$ TT_Sec_q2 : num 10 10 10 10 10 10 10 10 10 10 ...
$ TT_Sec_q98 : num 349 349 349 349 349 349 349 349 349 349 ...
$ TT_Hr_q2 : num 0.00278 0.00278 0.00278 0.00278 0.00278 ...
$ TT_Hr_q98 : num 0.0969 0.0969 0.0969 0.0969 0.0969 ...
$ TD_Mi_Mean : num 0.308 0.308 0.308 0.308 0.308 ...
$ TD_Mi_Mean_F : num 0.232 0.232 0.232 0.232 0.232 ...
$ TD_Mi_Med : num 0.198 0.198 0.198 0.198 0.198 ...
$ TD_Mi_Med_F : num 0.198 0.198 0.198 0.198 0.198 ...
$ TD_Mi_Cnt : int 2486795 2486795 2486795 2486795 2486795 2486795 2486795 2486795 2486795 2486795 ...
$ TD_Mi_Cnt_F : int 2387406 2387406 2387406 2387406 2387406 2387406 2387406 2387406 2387406 2387406 ...
$ TT_Sec_Mean : num 105 105 105 105 105 ...
$ TT_Sec_Mean_F : num 56.6 56.6 56.6 56.6 56.6 ...
$ TT_Sec_Med : num 39 39 39 39 39 39 39 39 39 39 ...
$ TT_Sec_Med_F : num 39 39 39 39 39 39 39 39 39 39 ...
$ TT_Sec_Cnt : int 2802888 2802888 2802888 2802888 2802888 2802888 2802888 2802888 2802888 2802888 ...
$ TT_Sec_Cnt_F : int 2705189 2705189 2705189 2705189 2705189 2705189 2705189 2705189 2705189 2705189 ...
$ TT_Hr_Mean : num 0.0291 0.0291 0.0291 0.0291 0.0291 ...
$ TT_Hr_Mean_F : num 0.0157 0.0157 0.0157 0.0157 0.0157 ...
$ TT_Hr_Med : num 0.0108 0.0108 0.0108 0.0108 0.0108 ...
$ TT_Hr_Med_F : num 0.0108 0.0108 0.0108 0.0108 0.0108 ...
$ TT_Hr_Cnt : int 2802888 2802888 2802888 2802888 2802888 2802888 2802888 2802888 2802888 2802888 ...
$ TT_Hr_Cnt_F : int 2705189 2705189 2705189 2705189 2705189 2705189 2705189 2705189 2705189 2705189 ...
$ TD_Mi_SS_q5 : num NA 0.0252 0.2422 0.7324 0.0794 ...
$ TD_Mi_SS_q95 : num NA 0.626 0.242 1.008 0.176 ...
$ TT_Sec_SS_q5 : num NA 11.9 37 30.5 172.9 ...
$ TT_Sec_SS_q95 : num NA 346.3 37 75.8 189.1 ...
$ TT_Hr_SS_q5 : num NA 0.00331 0.01028 0.00849 0.04803 ...
$ TT_Hr_SS_q95 : num NA 0.0962 0.0103 0.0211 0.0525 ...
$ TD_Mi_SS_Mean : num NaN 0.437 0.242 0.908 0.128 ...
$ TD_Mi_SS_Mean_F : num NaN 0.457 0.242 0.977 NaN ...
$ TD_Mi_SS_Med : num NA 0.512 0.242 0.962 0.128 ...
$ TD_Mi_SS_Med_F : num NA 0.512 0.242 1.008 NA ...
$ TD_Mi_SS_Cnt : int 0 14 1 4 2 87 22 118 91 11 ...
$ TD_Mi_SS_Cnt_F : int 0 12 1 3 0 77 18 106 81 9 ...
$ TT_Sec_SS_Mean : num NaN 215.8 37 58.2 181 ...
$ TT_Sec_SS_Mean_F : num NaN 218.9 37 65.5 NaN ...
$ TT_Sec_SS_Med : num NA 223.5 37 65.5 181 ...
$ TT_Sec_SS_Med_F : num NA 223.5 37 65.5 NA ...
$ TT_Sec_SS_Cnt : int 0 14 1 4 2 173 22 141 141 11 ...
$ TT_Sec_SS_Cnt_F : int 0 12 1 2 0 156 18 127 128 9 ...
$ TT_Hr_SS_Mean : num NaN 0.0599 0.0103 0.0162 0.0503 ...
$ TT_Hr_SS_Mean_F : num NaN 0.0608 0.0103 0.0182 NaN ...
$ TT_Hr_SS_Med : num NA 0.0621 0.0103 0.0182 0.0503 ...
$ TT_Hr_SS_Med_F : num NA 0.0621 0.0103 0.0182 NA ...
$ TT_Hr_SS_Cnt : int 0 14 1 4 2 173 22 141 141 11 ...
$ TT_Hr_SS_Cnt_F : int 0 12 1 2 0 156 18 127 128 9 ...
$ TD_Mi_SSHG_q5 : num NA 0.0996 0.2422 0.7002 0.1816 ...
$ TD_Mi_SSHG_q95 : num NA 0.627 0.242 0.7 0.182 ...
$ TT_Sec_SSHG_q5 : num NA 59.6 37 25 190 11.6 236 51.5 55 8.8 ...
$ TT_Sec_SSHG_q95 : num NA 276 37 25 190 ...
$ TT_Hr_SSHG_q5 : num NA 0.01656 0.01028 0.00694 0.05278 ...
[list output truncated]
View(filter(AllDays_DirChange,
between(RowNum_OG, 2570060, 2570080)
) %>%
select(-matches("(q(2|5|(95)|(98)))|Mean|Med|Cnt")
)
)
Re-ordering the variables to ease with comprehension.
AllDays_NewOrder <- select(AllDays_DirChange,
RowNum_OG,
group,
StartStop_ID,
BusDay_EventNum,
Bus_ID,
Route,
RteChange2,
RouteAlt,
# RouteAlt_Lag1,
DirChange2,
Route_Direction,
Stop_Sequence,
Start_ID,
Start_Desc,
# Stop_ID,
StopID_Clean,
StopID_Indicator,
Stop_Desc,
Event_Type,
Event_Description,
Event_Time_Yr,
Event_Time_Mth,
Event_Time_Date,
Event_Time_Day,
Event_Time_Hr,
Event_Time_HrGroup,
Event_Time_Min,
Event_Time,
Departure_Time,
Dwell_Time,
Dwell_Time2,
Delta_Time,
Latitude,
Longitude,
Heading,
Odometer_Distance,
Odometer_Distance_Lag1,
Odometer_Distance_Mi,
TravelDistance_Ft,
TravelDistance_Mi,
TravelDistance_Mi_Hvrs,
TD_Mi_q2,
TD_Mi_q98,
TD_Mi_SS_q5,
TD_Mi_SS_q95,
TD_Mi_SSHG_q5,
TD_Mi_SSHG_q95,
TD_Mi_Mean,
TD_Mi_Mean_F,
TD_Mi_SS_Mean,
TD_Mi_SS_Mean_F,
TD_Mi_SSHG_Mean,
TD_Mi_SSHG_Mean_F,
TD_Mi_Med,
TD_Mi_Med_F,
TD_Mi_SS_Med,
TD_Mi_SS_Med_F,
TD_Mi_SSHG_Med,
TD_Mi_SSHG_Med_F,
TD_Mi_Cnt,
TD_Mi_Cnt_F,
TD_Mi_SS_Cnt,
TD_Mi_SS_Cnt_F,
TD_Mi_SSHG_Cnt,
TD_Mi_SSHG_Cnt_F,
TravelTime_Sec,
TT_Sec_q2,
TT_Sec_q98,
TT_Sec_SS_q5,
TT_Sec_SS_q95,
TT_Sec_SSHG_q5,
TT_Sec_SSHG_q95,
TT_Sec_Mean,
TT_Sec_Mean_F,
TT_Sec_SS_Mean,
TT_Sec_SS_Mean_F,
TT_Sec_SSHG_Mean,
TT_Sec_SSHG_Mean_F,
TT_Sec_Med,
TT_Sec_Med_F,
TT_Sec_SS_Med,
TT_Sec_SS_Med_F,
TT_Sec_SSHG_Med,
TT_Sec_SSHG_Med_F,
TT_Sec_Cnt,
TT_Sec_Cnt_F,
TT_Sec_SS_Cnt,
TT_Sec_SS_Cnt_F,
TT_Sec_SSHG_Cnt,
TT_Sec_SSHG_Cnt_F,
TravelTime_Hr,
TT_Hr_q2,
TT_Hr_q98,
TT_Hr_SS_q5,
TT_Hr_SS_q95,
TT_Hr_SSHG_q5,
TT_Hr_SSHG_q95,
TT_Hr_Mean,
TT_Hr_Mean_F,
TT_Hr_SS_Mean,
TT_Hr_SS_Mean_F,
TT_Hr_SSHG_Mean,
TT_Hr_SSHG_Mean_F,
TT_Hr_Med,
TT_Hr_Med_F,
TT_Hr_SS_Med,
TT_Hr_SS_Med_F,
TT_Hr_SSHG_Med,
TT_Hr_SSHG_Med_F,
TT_Hr_Cnt,
TT_Hr_Cnt_F,
TT_Hr_SS_Cnt,
TT_Hr_SS_Cnt_F,
TT_Hr_SSHG_Cnt,
TT_Hr_SSHG_Cnt_F,
SpeedAvg_Mph
)
rm(AllDays_DirChange)
str(AllDays_NewOrder)
'data.frame': 2809529 obs. of 114 variables:
$ RowNum_OG : int 1 3 4 5 6 7 9 10 11 12 ...
$ group : Factor w/ 5 levels "1","2","3","4",..: 1 1 1 1 1 1 1 1 1 1 ...
$ StartStop_ID : chr "NULL--5004572" "5004572--5004573" "5004573--5002210" "5002210--5002209" ...
$ BusDay_EventNum : int 1 2 3 4 5 6 7 8 9 10 ...
$ Bus_ID : int 11 11 11 11 11 11 11 11 11 11 ...
$ Route : chr "S80" "S80" "S80" "S80" ...
$ RteChange2 : Factor w/ 2 levels "Change","Same": 1 2 2 2 2 2 2 2 2 2 ...
$ RouteAlt : Factor w/ 14 levels "1","10","11",..: 1 1 1 1 1 1 6 6 6 6 ...
$ DirChange2 : Factor w/ 2 levels "Change","Same": 1 2 2 2 2 2 1 2 2 2 ...
$ Route_Direction : Factor w/ 12 levels "","ANTICLKW",..: 6 6 6 6 6 6 6 6 6 6 ...
$ Stop_Sequence : int 7 6 3 2 8 1 2 3 4 2 ...
$ Start_ID : chr NA "5004572" "5004573" "5002210" ...
$ Start_Desc : chr NA "BEULAH ST + CHARLES ARRINGTON DR" "WALKER LN + #6363" "WALKER LN + BEULAH ST" ...
$ StopID_Clean : chr "5004572" "5004573" "5002210" "5002209" ...
$ StopID_Indicator : Factor w/ 2 levels "ID_Bad","ID_OK": 2 2 2 2 2 2 2 2 2 2 ...
$ Stop_Desc : chr "BEULAH ST + CHARLES ARRINGTON DR" "WALKER LN + #6363" "WALKER LN + BEULAH ST" "BEULAH ST + CHARLES ARRINGTON DR" ...
$ Event_Type : int 4 4 4 4 3 3 4 4 4 4 ...
$ Event_Description : Factor w/ 3 levels "Serviced Stop ",..: 3 3 3 3 1 1 3 3 3 3 ...
$ Event_Time_Yr : int 2016 2016 2016 2016 2016 2016 2016 2016 2016 2016 ...
$ Event_Time_Mth : int 10 10 10 10 10 10 10 10 10 10 ...
$ Event_Time_Date : int 3 3 3 3 3 3 3 3 3 3 ...
$ Event_Time_Day : Ord.factor w/ 7 levels "Sun"<"Mon"<"Tues"<..: 2 2 2 2 2 2 2 2 2 2 ...
$ Event_Time_Hr : int 6 6 6 6 6 6 6 6 6 6 ...
$ Event_Time_HrGroup : Ord.factor w/ 8 levels "Group0_2"<"Group3_5"<..: 3 3 3 3 3 3 3 3 3 3 ...
$ Event_Time_Min : int 6 9 10 10 13 14 21 21 23 23 ...
$ Event_Time : POSIXct, format: "2016-10-03 06:06:47" "2016-10-03 06:09:47" ...
$ Departure_Time : POSIXct, format: "2016-10-03 06:06:47" "2016-10-03 06:09:47" ...
$ Dwell_Time : int 0 0 0 0 0 104 0 0 0 0 ...
$ Dwell_Time2 : num 0 0 0 0 0 104 0 0 0 0 ...
$ Delta_Time : int -177 24 165 25 73 719 74 76 63 69 ...
$ Latitude : num 38.8 38.8 38.8 38.8 38.8 ...
$ Longitude : num -77.2 -77.2 -77.2 -77.2 -77.2 ...
$ Heading : int 199 97 276 15 119 100 274 104 241 274 ...
$ Odometer_Distance : int 43543 45139 46418 50115 51074 51303 55633 56163 56285 57262 ...
$ Odometer_Distance_Lag1: int NA 43543 45139 46418 50115 51074 51303 55633 56163 56285 ...
$ Odometer_Distance_Mi : num 8.25 8.55 8.79 9.49 9.67 ...
$ TravelDistance_Ft : int NA 1596 1279 3697 959 229 4330 530 122 977 ...
$ TravelDistance_Mi : num NA 0.302 0.242 0.7 0.182 ...
$ TravelDistance_Mi_Hvrs: num NA 0.15 0.105 0.165 0.832 ...
$ TD_Mi_q2 : num 0.0521 0.0521 0.0521 0.0521 0.0521 ...
$ TD_Mi_q98 : num 0.959 0.959 0.959 0.959 0.959 ...
$ TD_Mi_SS_q5 : num NA 0.0252 0.2422 0.7324 0.0794 ...
$ TD_Mi_SS_q95 : num NA 0.626 0.242 1.008 0.176 ...
$ TD_Mi_SSHG_q5 : num NA 0.0996 0.2422 0.7002 0.1816 ...
$ TD_Mi_SSHG_q95 : num NA 0.627 0.242 0.7 0.182 ...
$ TD_Mi_Mean : num 0.308 0.308 0.308 0.308 0.308 ...
$ TD_Mi_Mean_F : num 0.232 0.232 0.232 0.232 0.232 ...
$ TD_Mi_SS_Mean : num NaN 0.437 0.242 0.908 0.128 ...
$ TD_Mi_SS_Mean_F : num NaN 0.457 0.242 0.977 NaN ...
$ TD_Mi_SSHG_Mean : num NaN 0.442 0.242 0.7 0.182 ...
$ TD_Mi_SSHG_Mean_F : num NaN 0.491 0.242 0.7 0.182 ...
$ TD_Mi_Med : num 0.198 0.198 0.198 0.198 0.198 ...
$ TD_Mi_Med_F : num 0.198 0.198 0.198 0.198 0.198 ...
$ TD_Mi_SS_Med : num NA 0.512 0.242 0.962 0.128 ...
$ TD_Mi_SS_Med_F : num NA 0.512 0.242 1.008 NA ...
$ TD_Mi_SSHG_Med : num NA 0.512 0.242 0.7 0.182 ...
$ TD_Mi_SSHG_Med_F : num NA 0.512 0.242 0.7 0.182 ...
$ TD_Mi_Cnt : int 2486795 2486795 2486795 2486795 2486795 2486795 2486795 2486795 2486795 2486795 ...
$ TD_Mi_Cnt_F : int 2387406 2387406 2387406 2387406 2387406 2387406 2387406 2387406 2387406 2387406 ...
$ TD_Mi_SS_Cnt : int 0 14 1 4 2 87 22 118 91 11 ...
$ TD_Mi_SS_Cnt_F : int 0 12 1 3 0 77 18 106 81 9 ...
$ TD_Mi_SSHG_Cnt : int 0 7 1 1 1 23 6 29 28 3 ...
$ TD_Mi_SSHG_Cnt_F : int 0 5 1 1 1 19 4 25 24 1 ...
$ TravelTime_Sec : num NA 180 37 25 190 29 288 52 76 8 ...
$ TT_Sec_q2 : num 10 10 10 10 10 10 10 10 10 10 ...
$ TT_Sec_q98 : num 349 349 349 349 349 349 349 349 349 349 ...
$ TT_Sec_SS_q5 : num NA 11.9 37 30.5 172.9 ...
$ TT_Sec_SS_q95 : num NA 346.3 37 75.8 189.1 ...
$ TT_Sec_SSHG_q5 : num NA 59.6 37 25 190 11.6 236 51.5 55 8.8 ...
$ TT_Sec_SSHG_q95 : num NA 276 37 25 190 ...
$ TT_Sec_Mean : num 105 105 105 105 105 ...
$ TT_Sec_Mean_F : num 56.6 56.6 56.6 56.6 56.6 ...
$ TT_Sec_SS_Mean : num NaN 215.8 37 58.2 181 ...
$ TT_Sec_SS_Mean_F : num NaN 218.9 37 65.5 NaN ...
$ TT_Sec_SSHG_Mean : num NaN 202 37 25 190 ...
$ TT_Sec_SSHG_Mean_F : num NaN 226 37 25 190 ...
$ TT_Sec_Med : num 39 39 39 39 39 39 39 39 39 39 ...
$ TT_Sec_Med_F : num 39 39 39 39 39 39 39 39 39 39 ...
$ TT_Sec_SS_Med : num NA 223.5 37 65.5 181 ...
$ TT_Sec_SS_Med_F : num NA 223.5 37 65.5 NA ...
$ TT_Sec_SSHG_Med : num NA 219 37 25 190 134 286 60 65 16 ...
$ TT_Sec_SSHG_Med_F : num NA 219 37 25 190 134 286 60 65 16 ...
$ TT_Sec_Cnt : int 2802888 2802888 2802888 2802888 2802888 2802888 2802888 2802888 2802888 2802888 ...
$ TT_Sec_Cnt_F : int 2705189 2705189 2705189 2705189 2705189 2705189 2705189 2705189 2705189 2705189 ...
$ TT_Sec_SS_Cnt : int 0 14 1 4 2 173 22 141 141 11 ...
$ TT_Sec_SS_Cnt_F : int 0 12 1 2 0 156 18 127 128 9 ...
$ TT_Sec_SSHG_Cnt : int 0 7 1 1 1 35 6 36 35 3 ...
$ TT_Sec_SSHG_Cnt_F : int 0 5 1 1 1 31 4 32 32 1 ...
$ TravelTime_Hr : num NA 0.05 0.01028 0.00694 0.05278 ...
$ TT_Hr_q2 : num 0.00278 0.00278 0.00278 0.00278 0.00278 ...
$ TT_Hr_q98 : num 0.0969 0.0969 0.0969 0.0969 0.0969 ...
$ TT_Hr_SS_q5 : num NA 0.00331 0.01028 0.00849 0.04803 ...
$ TT_Hr_SS_q95 : num NA 0.0962 0.0103 0.0211 0.0525 ...
$ TT_Hr_SSHG_q5 : num NA 0.01656 0.01028 0.00694 0.05278 ...
$ TT_Hr_SSHG_q95 : num NA 0.07653 0.01028 0.00694 0.05278 ...
$ TT_Hr_Mean : num 0.0291 0.0291 0.0291 0.0291 0.0291 ...
$ TT_Hr_Mean_F : num 0.0157 0.0157 0.0157 0.0157 0.0157 ...
$ TT_Hr_SS_Mean : num NaN 0.0599 0.0103 0.0162 0.0503 ...
$ TT_Hr_SS_Mean_F : num NaN 0.0608 0.0103 0.0182 NaN ...
[list output truncated]
# View(head(AllDays_NewOrder, 500))
# View(tail(AllDays_NewOrder, 500))
Summarizing the data to help spot anomolies.
summary(AllDays_NewOrder)
RowNum_OG group StartStop_ID BusDay_EventNum Bus_ID
Min. : 1 1:559521 Length:2809529 Min. : 1.0 Min. : 11
1st Qu.: 784722 2:561389 Class :character 1st Qu.: 113.0 1st Qu.:2922
Median :1563300 3:567794 Mode :character Median : 248.0 Median :6195
Mean :1562504 4:559180 Mean : 290.5 Mean :5382
3rd Qu.:2337981 5:561645 3rd Qu.: 428.0 3rd Qu.:7104
Max. :3119443 Max. :1344.0 Max. :8105
Route RteChange2 RouteAlt DirChange2
Length:2809529 Change: 23772 2 :1128810 Change: 65126
Class :character Same :2785757 1 :1065425 Same :2744403
Mode :character 3 : 260372
4 : 130801
5 : 75039
6 : 56408
(Other): 92674
Route_Direction Stop_Sequence Start_ID Start_Desc
SOUTH :739235 Min. : 1.00 Length:2809529 Length:2809529
NORTH :735203 1st Qu.: 12.00 Class :character Class :character
WEST :649706 Median : 24.00 Mode :character Mode :character
EAST :628074 Mean : 26.83
LOOP : 35611 3rd Qu.: 39.00
CLOCKWIS: 10671 Max. :104.00
(Other) : 11029
StopID_Clean StopID_Indicator Stop_Desc Event_Type
Length:2809529 ID_Bad: 18948 Length:2809529 Min. :3.0
Class :character ID_OK :2790581 Class :character 1st Qu.:3.0
Mode :character Mode :character Median :4.0
Mean :3.6
3rd Qu.:4.0
Max. :5.0
Event_Description Event_Time_Yr
Serviced Stop :1127366 Min. :2016
Unknown Stop : 2579 1st Qu.:2016
UnServiced Stop :1679584 Median :2016
Mean :2016
3rd Qu.:2016
Max. :2016
Event_Time_Mth Event_Time_Date Event_Time_Day Event_Time_Hr Event_Time_HrGroup
Min. :10 Min. :3.000 Sun : 0 Min. : 0.00 Group6_8 :611612
1st Qu.:10 1st Qu.:4.000 Mon :559521 1st Qu.: 8.00 Group15_17:560103
Median :10 Median :5.000 Tues :561389 Median :13.00 Group18_20:461056
Mean :10 Mean :5.001 Wed :567794 Mean :12.97 Group9_11 :396514
3rd Qu.:10 3rd Qu.:6.000 Thurs:559180 3rd Qu.:18.00 Group12_14:353603
Max. :10 Max. :7.000 Fri :561645 Max. :23.00 Group21_23:244522
Sat : 0 (Other) :182119
Event_Time_Min Event_Time Departure_Time
Min. : 0.00 Min. :2016-10-03 00:00:00 Min. :2016-10-03 00:00:00
1st Qu.:14.00 1st Qu.:2016-10-04 08:36:14 1st Qu.:2016-10-04 08:36:20
Median :29.00 Median :2016-10-05 13:49:29 Median :2016-10-05 13:49:38
Mean :29.43 Mean :2016-10-05 13:29:21 Mean :2016-10-05 13:29:28
3rd Qu.:44.00 3rd Qu.:2016-10-06 17:58:06 3rd Qu.:2016-10-06 17:58:13
Max. :59.00 Max. :2016-10-07 23:59:59 Max. :2016-10-08 00:12:31
Dwell_Time Dwell_Time2 Delta_Time Latitude
Min. : 0.00 Min. : 0.000 Min. :-5606.0 Min. : 0.00
1st Qu.: 0.00 1st Qu.: 0.000 1st Qu.: 14.0 1st Qu.:38.86
Median : 0.00 Median : 0.000 Median : 157.0 Median :38.90
Mean : 12.56 Mean : 6.359 Mean : 268.8 Mean :38.91
3rd Qu.: 5.00 3rd Qu.: 4.000 3rd Qu.: 396.0 3rd Qu.:38.96
Max. :6205.00 Max. :6205.000 Max. : 9426.0 Max. :39.19
Longitude Heading Odometer_Distance Odometer_Distance_Lag1
Min. :-77.45 Min. : 0.0 Min. : 0 Min. : 0
1st Qu.:-77.07 1st Qu.: 89.0 1st Qu.: 177595 1st Qu.: 177326
Median :-77.02 Median :180.0 Median : 377510 Median : 376934
Mean :-77.02 Mean :176.9 Mean : 426254 Mean : 425713
3rd Qu.:-76.97 3rd Qu.:269.0 3rd Qu.: 623667 3rd Qu.: 622879
Max. : 0.00 Max. :360.0 Max. :11108034 Max. :10853226
NA's :6528
Odometer_Distance_Mi TravelDistance_Ft TravelDistance_Mi TravelDistance_Mi_Hvrs
Min. : 0.00 Min. : 1 Min. : 0.0 Min. : 0.000
1st Qu.: 33.64 1st Qu.: 699 1st Qu.: 0.1 1st Qu.: 0.106
Median : 71.50 Median : 1044 Median : 0.2 Median : 0.142
Mean : 80.73 Mean : 1624 Mean : 0.3 Mean : 0.201
3rd Qu.: 118.12 3rd Qu.: 1518 3rd Qu.: 0.3 3rd Qu.: 0.193
Max. :2103.79 Max. :1323464 Max. :250.7 Max. :24.407
NA's :322734 NA's :322734 NA's :6528
TD_Mi_q2 TD_Mi_q98 TD_Mi_SS_q5 TD_Mi_SS_q95
Min. :0.05208 Min. :0.9585 Min. : 0.000 Min. : 0.000
1st Qu.:0.05208 1st Qu.:0.9585 1st Qu.: 0.086 1st Qu.: 0.262
Median :0.05208 Median :0.9585 Median : 0.104 Median : 0.326
Mean :0.05208 Mean :0.9585 Mean : 0.164 Mean : 0.488
3rd Qu.:0.05208 3rd Qu.:0.9585 3rd Qu.: 0.139 3rd Qu.: 0.436
Max. :0.05208 Max. :0.9585 Max. :219.163 Max. :246.949
NA's :24757 NA's :24757
TD_Mi_SSHG_q5 TD_Mi_SSHG_q95 TD_Mi_Mean TD_Mi_Mean_F
Min. : 0.00 Min. : 0.00 Min. :0.3076 Min. :0.2318
1st Qu.: 0.09 1st Qu.: 0.25 1st Qu.:0.3076 1st Qu.:0.2318
Median : 0.11 Median : 0.31 Median :0.3076 Median :0.2318
Mean : 0.18 Mean : 0.47 Mean :0.3076 Mean :0.2318
3rd Qu.: 0.15 3rd Qu.: 0.42 3rd Qu.:0.3076 3rd Qu.:0.2318
Max. :250.66 Max. :250.66 Max. :0.3076 Max. :0.2318
NA's :35629 NA's :35629
TD_Mi_SS_Mean TD_Mi_SS_Mean_F TD_Mi_SSHG_Mean TD_Mi_SSHG_Mean_F
Min. : 0.000 Min. : 0.000 Min. : 0.00 Min. : 0.00
1st Qu.: 0.172 1st Qu.: 0.166 1st Qu.: 0.17 1st Qu.: 0.16
Median : 0.212 Median : 0.207 Median : 0.21 Median : 0.21
Mean : 0.307 Mean : 0.291 Mean : 0.31 Mean : 0.29
3rd Qu.: 0.267 3rd Qu.: 0.260 3rd Qu.: 0.27 3rd Qu.: 0.26
Max. :219.163 Max. :219.163 Max. :250.66 Max. :250.66
NA's :24757 NA's :27919 NA's :35629 NA's :44458
TD_Mi_Med TD_Mi_Med_F TD_Mi_SS_Med TD_Mi_SS_Med_F
Min. :0.1977 Min. :0.1977 Min. : 0.000 Min. : 0.000
1st Qu.:0.1977 1st Qu.:0.1977 1st Qu.: 0.146 1st Qu.: 0.146
Median :0.1977 Median :0.1977 Median : 0.196 Median : 0.196
Mean :0.1977 Mean :0.1977 Mean : 0.288 Mean : 0.282
3rd Qu.:0.1977 3rd Qu.:0.1977 3rd Qu.: 0.265 3rd Qu.: 0.265
Max. :0.1977 Max. :0.1977 Max. :219.163 Max. :219.163
NA's :24757 NA's :27919
TD_Mi_SSHG_Med TD_Mi_SSHG_Med_F TD_Mi_Cnt TD_Mi_Cnt_F
Min. : 0.00 Min. : 0.00 Min. :2486795 Min. :2387406
1st Qu.: 0.14 1st Qu.: 0.14 1st Qu.:2486795 1st Qu.:2387406
Median : 0.20 Median : 0.20 Median :2486795 Median :2387406
Mean : 0.29 Mean : 0.28 Mean :2486795 Mean :2387406
3rd Qu.: 0.27 3rd Qu.: 0.27 3rd Qu.:2486795 3rd Qu.:2387406
Max. :250.66 Max. :250.66 Max. :2486795 Max. :2387406
NA's :35629 NA's :44458
TD_Mi_SS_Cnt TD_Mi_SS_Cnt_F TD_Mi_SSHG_Cnt TD_Mi_SSHG_Cnt_F
Min. : 0.0 Min. : 0.0 Min. : 0.00 Min. : 0.00
1st Qu.: 163.0 1st Qu.: 146.0 1st Qu.: 26.00 1st Qu.: 22.00
Median : 280.0 Median : 252.0 Median : 45.00 Median : 39.00
Mean : 347.4 Mean : 312.7 Mean : 57.27 Mean : 50.85
3rd Qu.: 456.0 3rd Qu.: 411.0 3rd Qu.: 75.00 3rd Qu.: 67.00
Max. :1543.0 Max. :1388.0 Max. :663.00 Max. :595.00
TravelTime_Sec TT_Sec_q2 TT_Sec_q98 TT_Sec_SS_q5 TT_Sec_SS_q95
Min. : 1.0 Min. :10 Min. :349 Min. : 1.00 Min. : 1.00
1st Qu.: 25.0 1st Qu.:10 1st Qu.:349 1st Qu.: 15.00 1st Qu.: 48.00
Median : 39.0 Median :10 Median :349 Median : 22.00 Median : 80.05
Mean : 104.9 Mean :10 Mean :349 Mean : 61.26 Mean : 183.28
3rd Qu.: 72.0 3rd Qu.:10 3rd Qu.:349 3rd Qu.: 34.00 3rd Qu.: 134.60
Max. :60750.0 Max. :10 Max. :349 Max. :60750.00 Max. :60750.00
NA's :6641 NA's :6531 NA's :6531
TT_Sec_SSHG_q5 TT_Sec_SSHG_q95 TT_Sec_Mean TT_Sec_Mean_F
Min. : 1.00 Min. : 1.00 Min. :104.9 Min. :56.61
1st Qu.: 16.00 1st Qu.: 43.80 1st Qu.:104.9 1st Qu.:56.61
Median : 23.40 Median : 72.95 Median :104.9 Median :56.61
Mean : 67.33 Mean : 169.21 Mean :104.9 Mean :56.61
3rd Qu.: 36.70 3rd Qu.: 123.65 3rd Qu.:104.9 3rd Qu.:56.61
Max. :60750.00 Max. :60750.00 Max. :104.9 Max. :56.61
NA's :6535 NA's :6535
TT_Sec_SS_Mean TT_Sec_SS_Mean_F TT_Sec_SSHG_Mean TT_Sec_SSHG_Mean_F
Min. : 1.00 Min. : 1.00 Min. : 1.00 Min. : 1.00
1st Qu.: 29.06 1st Qu.: 27.54 1st Qu.: 28.38 1st Qu.: 27.21
Median : 44.16 Median : 41.91 Median : 43.38 Median : 41.48
Mean : 104.88 Mean : 91.34 Mean : 104.88 Mean : 93.53
3rd Qu.: 73.30 3rd Qu.: 69.25 3rd Qu.: 72.93 3rd Qu.: 70.12
Max. :60750.00 Max. :60750.00 Max. :60750.00 Max. :60750.00
NA's :6531 NA's :10519 NA's :6535 NA's :12811
TT_Sec_Med TT_Sec_Med_F TT_Sec_SS_Med TT_Sec_SS_Med_F
Min. :39 Min. :39 Min. : 1.00 Min. : 1.00
1st Qu.:39 1st Qu.:39 1st Qu.: 26.00 1st Qu.: 26.00
Median :39 Median :39 Median : 39.00 Median : 39.00
Mean :39 Mean :39 Mean : 91.55 Mean : 84.82
3rd Qu.:39 3rd Qu.:39 3rd Qu.: 65.00 3rd Qu.: 65.00
Max. :39 Max. :39 Max. :60750.00 Max. :60750.00
NA's :6531 NA's :10519
TT_Sec_SSHG_Med TT_Sec_SSHG_Med_F TT_Sec_Cnt TT_Sec_Cnt_F
Min. : 1.00 Min. : 1.00 Min. :2802888 Min. :2705189
1st Qu.: 26.00 1st Qu.: 26.00 1st Qu.:2802888 1st Qu.:2705189
Median : 39.00 Median : 38.50 Median :2802888 Median :2705189
Mean : 94.94 Mean : 88.44 Mean :2802888 Mean :2705189
3rd Qu.: 67.00 3rd Qu.: 66.50 3rd Qu.:2802888 3rd Qu.:2705189
Max. :60750.00 Max. :60750.00 Max. :2802888 Max. :2705189
NA's :6535 NA's :12811
TT_Sec_SS_Cnt TT_Sec_SS_Cnt_F TT_Sec_SSHG_Cnt TT_Sec_SSHG_Cnt_F
Min. : 0.0 Min. : 0.0 Min. : 0.00 Min. : 0.00
1st Qu.: 194.0 1st Qu.: 177.0 1st Qu.: 29.00 1st Qu.: 26.00
Median : 310.0 Median : 282.0 Median : 51.00 Median : 46.00
Mean : 384.4 Mean : 349.8 Mean : 63.46 Mean : 57.09
3rd Qu.: 497.0 3rd Qu.: 452.0 3rd Qu.: 83.00 3rd Qu.: 74.00
Max. :1664.0 Max. :1523.0 Max. :691.00 Max. :634.00
TravelTime_Hr TT_Hr_q2 TT_Hr_q98 TT_Hr_SS_q5
Min. : 0.000 Min. :0.002778 Min. :0.09694 Min. : 0.000
1st Qu.: 0.007 1st Qu.:0.002778 1st Qu.:0.09694 1st Qu.: 0.004
Median : 0.011 Median :0.002778 Median :0.09694 Median : 0.006
Mean : 0.029 Mean :0.002778 Mean :0.09694 Mean : 0.017
3rd Qu.: 0.020 3rd Qu.:0.002778 3rd Qu.:0.09694 3rd Qu.: 0.009
Max. :16.875 Max. :0.002778 Max. :0.09694 Max. :16.875
NA's :6641 NA's :6531
TT_Hr_SS_q95 TT_Hr_SSHG_q5 TT_Hr_SSHG_q95 TT_Hr_Mean
Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. :0.02913
1st Qu.: 0.013 1st Qu.: 0.004 1st Qu.: 0.012 1st Qu.:0.02913
Median : 0.022 Median : 0.006 Median : 0.020 Median :0.02913
Mean : 0.051 Mean : 0.019 Mean : 0.047 Mean :0.02913
3rd Qu.: 0.037 3rd Qu.: 0.010 3rd Qu.: 0.034 3rd Qu.:0.02913
Max. :16.875 Max. :16.875 Max. :16.875 Max. :0.02913
NA's :6531 NA's :6535 NA's :6535
TT_Hr_Mean_F TT_Hr_SS_Mean TT_Hr_SS_Mean_F TT_Hr_SSHG_Mean
Min. :0.01573 Min. : 0.000 Min. : 0.000 Min. : 0.000
1st Qu.:0.01573 1st Qu.: 0.008 1st Qu.: 0.008 1st Qu.: 0.008
Median :0.01573 Median : 0.012 Median : 0.012 Median : 0.012
Mean :0.01573 Mean : 0.029 Mean : 0.025 Mean : 0.029
3rd Qu.:0.01573 3rd Qu.: 0.020 3rd Qu.: 0.019 3rd Qu.: 0.020
Max. :0.01573 Max. :16.875 Max. :16.875 Max. :16.875
NA's :6531 NA's :10532 NA's :6535
TT_Hr_SSHG_Mean_F TT_Hr_Med TT_Hr_Med_F TT_Hr_SS_Med
Min. : 0.000 Min. :0.01083 Min. :0.01083 Min. : 0.000
1st Qu.: 0.008 1st Qu.:0.01083 1st Qu.:0.01083 1st Qu.: 0.007
Median : 0.012 Median :0.01083 Median :0.01083 Median : 0.011
Mean : 0.026 Mean :0.01083 Mean :0.01083 Mean : 0.025
3rd Qu.: 0.019 3rd Qu.:0.01083 3rd Qu.:0.01083 3rd Qu.: 0.018
Max. :16.875 Max. :0.01083 Max. :0.01083 Max. :16.875
NA's :12895 NA's :6531
TT_Hr_SS_Med_F TT_Hr_SSHG_Med TT_Hr_SSHG_Med_F TT_Hr_Cnt
Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. :2802888
1st Qu.: 0.007 1st Qu.: 0.007 1st Qu.: 0.007 1st Qu.:2802888
Median : 0.011 Median : 0.011 Median : 0.011 Median :2802888
Mean : 0.024 Mean : 0.026 Mean : 0.025 Mean :2802888
3rd Qu.: 0.018 3rd Qu.: 0.019 3rd Qu.: 0.018 3rd Qu.:2802888
Max. :16.875 Max. :16.875 Max. :16.875 Max. :2802888
NA's :10532 NA's :6535 NA's :12895
TT_Hr_Cnt_F TT_Hr_SS_Cnt TT_Hr_SS_Cnt_F TT_Hr_SSHG_Cnt
Min. :2705189 Min. : 0.0 Min. : 0.0 Min. : 0.00
1st Qu.:2705189 1st Qu.: 194.0 1st Qu.: 176.0 1st Qu.: 29.00
Median :2705189 Median : 310.0 Median : 282.0 Median : 51.00
Mean :2705189 Mean : 384.4 Mean : 349.6 Mean : 63.46
3rd Qu.:2705189 3rd Qu.: 497.0 3rd Qu.: 452.0 3rd Qu.: 83.00
Max. :2705189 Max. :1664.0 Max. :1523.0 Max. :691.00
TT_Hr_SSHG_Cnt_F SpeedAvg_Mph
Min. : 0.00 Min. : 0.0
1st Qu.: 26.00 1st Qu.: 10.1
Median : 46.00 Median : 16.7
Mean : 57.05 Mean : 26.5
3rd Qu.: 74.00 3rd Qu.: 31.2
Max. :634.00 Max. :22924.1
NA's :322762
Investigation of TravelDistance_Mi.
View(TravDistMi_Pctiles): 99% of TravelDistance_Mi are about 1 mile or less…but some weird TravelDistance_Mi values (e.g., 584 miles traveled) exist.
TravDistMi_Ntile <- as.data.frame(AllDays_NewOrder$TravelDistance_Mi) %>%
mutate(#Pctile = ntile(AllDays_NewOrder$TravelDistance_Mi, 100),
#MinR = min_rank(AllDays_NewOrder$TravelDistance_Mi),
PctR = percent_rank(AllDays_NewOrder$TravelDistance_Mi),
PctR_Round = round(PctR, 2)
)
colnames(TravDistMi_Ntile)[1] <- "TravelDistance_Mi"
# str(TravDistMi_Ntile)
TravDistMi_Ntile_Rows <- nrow(TravDistMi_Ntile)
# View(tail(TravDistMi_Ntile, 500))
TravDistMi_Pctiles <- group_by(TravDistMi_Ntile,
PctR_Round
) %>%
summarise(
MinTravDistMiAtPctile = min(TravelDistance_Mi),
CntsAtPctile = n(),
PctsAtPctile = CntsAtPctile / TravDistMi_Ntile_Rows
) %>%
mutate(CumSumPAtP = cumsum(PctsAtPctile)
)
rm(TravDistMi_Ntile)
rm(TravDistMi_Ntile_Rows)
View(TravDistMi_Pctiles)
TravDistMi_Pctiles
Investigation of TravelDistance_Mi.
Why are some TravelDistance_Mi “NA”? It looks like partially because the records are the first trip of the day (for that bus), so I purposefully set the distance to “NA”. Another reason is due to the odometer recording a value less than the previous odometer recording. In most cases, I have no explanation for this - though I have observed about 67% of all instances where TravelDistance_Mi is NA (other than because it’s the first record of the day) are instances where DirChange2 is “Change”. This is weird and should be asked to WMATA.
# View(head(AllDays_NewOrder, 500))
View(filter(AllDays_NewOrder,
BusDay_EventNum != 1 # When BusDay_EventNum == 1, TravelDistance_Mi is NA by design (don't want to calculate distance based on yesterday's position)
) %>%
group_by(StartStop_ID) %>%
summarise(Cnts = sum(is.na(TravelDistance_Mi)
)
) %>%
arrange(desc(Cnts)
)
)
View(filter(AllDays_NewOrder,
StartStop_ID == "1000245--1000211"
) %>%
select(RowNum_OG,
StartStop_ID,
Event_Time,
Event_Time_HrGroup,
Bus_ID,
TravelDistance_Mi,
TravelDistance_Mi_Hvrs,
TD_Mi_SS_Mean,
TD_Mi_SS_Mean_F,
TD_Mi_SSHG_Mean,
TD_Mi_SSHG_Mean_F,
TD_Mi_SS_Med,
TD_Mi_SS_Med_F,
TD_Mi_SSHG_Med,
TD_Mi_SSHG_Med_F,
TD_Mi_SS_Cnt,
TD_Mi_SS_Cnt_F,
TD_Mi_SSHG_Cnt,
TD_Mi_SSHG_Cnt_F
) %>%
mutate(Ratio_MeanToHvrs = TD_Mi_SS_Mean / TravelDistance_Mi_Hvrs) %>%
arrange(Event_Time)
)
View(filter(AllDays_NewOrder,
is.na(TravelDistance_Mi)
)
)
# These records are NA becuase the record is the first record of the day (the Event_Time_Date)
View(filter(AllDays_NewOrder,
between(RowNum_OG, 326, 346) | # 336
between(RowNum_OG, 591, 611) | # 601
between(RowNum_OG, 845, 865) # 855
)
)
Investigation of TravelDistance_Mi.
These records are NA becuase the current record odometer is less than the previous record odometer. Theoretically, this should NOT happen. Me: it appears that about 67% of all instances where TravelDistance_Mi is NA (other than because it’s th first record of the day) are instances where DirChange2 is “Change”. This is weird and should be asked to WMATA.
View(filter(AllDays_NewOrder,
between(RowNum_OG, 194, 214) | # 204
between(RowNum_OG, 440, 460) | # 450
between(RowNum_OG, 478, 498) | # 488
between(RowNum_OG, 510, 530) # 520
)
)
TestTable <- filter(AllDays_NewOrder,
BusDay_EventNum != 1
) %>%
mutate(TravelDistance_NA = as.factor(ifelse(is.na(TravelDistance_Mi),
"True",
"False"
)
)
) %>%
group_by(DirChange2, TravelDistance_NA) %>%
summarise(TravDistMi_NACnts = n()
)
# TestTable
TestTable_Spread <- as.data.frame(spread(TestTable,
TravelDistance_NA,
TravDistMi_NACnts
)
) %>%
select(False,
True
)
row.names(TestTable_Spread) <- c("Change", "Same")
# str(TestTable_Spread)
# TestTable_Spread
prop.table(as.table(as.matrix(TestTable_Spread)
),
1
)
False True
Change 0.8267006 0.1732994
Same 0.8884818 0.1115182
prop.table(as.table(as.matrix(TestTable_Spread)
),
2
)
False True
Change 0.01948009 0.03211514
Same 0.98051991 0.96788486
Investigation of TravelDistance_Mi.
Let’s look at just the TravelDistance_Mi values that are NOT “NA”.
rm(TestTable)
rm(TestTable_Spread)
TravelDistance_Mi_NoNA <- filter(AllDays_NewOrder,
# TravelDistance_Mi != 0 &
!is.na(TravelDistance_Mi)
)
dim(AllDays_NewOrder)
[1] 2809529 114
dim(TravelDistance_Mi_NoNA)
[1] 2486795 114
nrow(AllDays_NewOrder) - nrow(TravelDistance_Mi_NoNA)
[1] 322734
str(TravelDistance_Mi_NoNA)
'data.frame': 2486795 obs. of 114 variables:
$ RowNum_OG : int 3 4 5 6 7 9 10 11 12 13 ...
$ group : Factor w/ 5 levels "1","2","3","4",..: 1 1 1 1 1 1 1 1 1 1 ...
$ StartStop_ID : chr "5004572--5004573" "5004573--5002210" "5002210--5002209" "5002209--5000070" ...
$ BusDay_EventNum : int 2 3 4 5 6 7 8 9 10 11 ...
$ Bus_ID : int 11 11 11 11 11 11 11 11 11 11 ...
$ Route : chr "S80" "S80" "S80" "S80" ...
$ RteChange2 : Factor w/ 2 levels "Change","Same": 2 2 2 2 2 2 2 2 2 2 ...
$ RouteAlt : Factor w/ 14 levels "1","10","11",..: 1 1 1 1 1 6 6 6 6 6 ...
$ DirChange2 : Factor w/ 2 levels "Change","Same": 2 2 2 2 2 1 2 2 2 2 ...
$ Route_Direction : Factor w/ 12 levels "","ANTICLKW",..: 6 6 6 6 6 6 6 6 6 6 ...
$ Stop_Sequence : int 6 3 2 8 1 2 3 4 2 6 ...
$ Start_ID : chr "5004572" "5004573" "5002210" "5002209" ...
$ Start_Desc : chr "BEULAH ST + CHARLES ARRINGTON DR" "WALKER LN + #6363" "WALKER LN + BEULAH ST" "BEULAH ST + CHARLES ARRINGTON DR" ...
$ StopID_Clean : chr "5004573" "5002210" "5002209" "5000070" ...
$ StopID_Indicator : Factor w/ 2 levels "ID_Bad","ID_OK": 2 2 2 2 2 2 2 2 2 2 ...
$ Stop_Desc : chr "WALKER LN + #6363" "WALKER LN + BEULAH ST" "BEULAH ST + CHARLES ARRINGTON DR" "FRANCONIA-SPRGFLD STA. + BUS BAY D" ...
$ Event_Type : int 4 4 4 3 3 4 4 4 4 4 ...
$ Event_Description : Factor w/ 3 levels "Serviced Stop ",..: 3 3 3 1 1 3 3 3 3 3 ...
$ Event_Time_Yr : int 2016 2016 2016 2016 2016 2016 2016 2016 2016 2016 ...
$ Event_Time_Mth : int 10 10 10 10 10 10 10 10 10 10 ...
$ Event_Time_Date : int 3 3 3 3 3 3 3 3 3 3 ...
$ Event_Time_Day : Ord.factor w/ 7 levels "Sun"<"Mon"<"Tues"<..: 2 2 2 2 2 2 2 2 2 2 ...
$ Event_Time_Hr : int 6 6 6 6 6 6 6 6 6 6 ...
$ Event_Time_HrGroup : Ord.factor w/ 8 levels "Group0_2"<"Group3_5"<..: 3 3 3 3 3 3 3 3 3 3 ...
$ Event_Time_Min : int 9 10 10 13 14 21 21 23 23 26 ...
$ Event_Time : POSIXct, format: "2016-10-03 06:09:47" "2016-10-03 06:10:24" ...
$ Departure_Time : POSIXct, format: "2016-10-03 06:09:47" "2016-10-03 06:10:24" ...
$ Dwell_Time : int 0 0 0 0 104 0 0 0 0 0 ...
$ Dwell_Time2 : num 0 0 0 0 104 0 0 0 0 0 ...
$ Delta_Time : int 24 165 25 73 719 74 76 63 69 165 ...
$ Latitude : num 38.8 38.8 38.8 38.8 38.8 ...
$ Longitude : num -77.2 -77.2 -77.2 -77.2 -77.2 ...
$ Heading : int 97 276 15 119 100 274 104 241 274 1 ...
$ Odometer_Distance : int 45139 46418 50115 51074 51303 55633 56163 56285 57262 58363 ...
$ Odometer_Distance_Lag1: int 43543 45139 46418 50115 51074 51303 55633 56163 56285 57262 ...
$ Odometer_Distance_Mi : num 8.55 8.79 9.49 9.67 9.72 ...
$ TravelDistance_Ft : int 1596 1279 3697 959 229 4330 530 122 977 1101 ...
$ TravelDistance_Mi : num 0.3023 0.2422 0.7002 0.1816 0.0434 ...
$ TravelDistance_Mi_Hvrs: num 0.15 0.105 0.165 0.832 0.068 ...
$ TD_Mi_q2 : num 0.0521 0.0521 0.0521 0.0521 0.0521 ...
$ TD_Mi_q98 : num 0.959 0.959 0.959 0.959 0.959 ...
$ TD_Mi_SS_q5 : num 0.025246 0.242235 0.732434 0.079432 0.000436 ...
$ TD_Mi_SS_q95 : num 0.626 0.242 1.008 0.176 10.435 ...
$ TD_Mi_SSHG_q5 : num 0.09956 0.24223 0.70019 0.18163 0.00269 ...
$ TD_Mi_SSHG_q95 : num 0.627 0.242 0.7 0.182 0.497 ...
$ TD_Mi_Mean : num 0.308 0.308 0.308 0.308 0.308 ...
$ TD_Mi_Mean_F : num 0.232 0.232 0.232 0.232 0.232 ...
$ TD_Mi_SS_Mean : num 0.437 0.242 0.908 0.128 1.166 ...
$ TD_Mi_SS_Mean_F : num 0.457 0.242 0.977 NaN 0.226 ...
$ TD_Mi_SSHG_Mean : num 0.442 0.242 0.7 0.182 0.232 ...
$ TD_Mi_SSHG_Mean_F : num 0.491 0.242 0.7 0.182 0.228 ...
$ TD_Mi_Med : num 0.198 0.198 0.198 0.198 0.198 ...
$ TD_Mi_Med_F : num 0.198 0.198 0.198 0.198 0.198 ...
$ TD_Mi_SS_Med : num 0.5116 0.2422 0.9616 0.1278 0.0426 ...
$ TD_Mi_SS_Med_F : num 0.5116 0.2422 1.0081 NA 0.0426 ...
$ TD_Mi_SSHG_Med : num 0.512 0.242 0.7 0.182 0.108 ...
$ TD_Mi_SSHG_Med_F : num 0.512 0.242 0.7 0.182 0.108 ...
$ TD_Mi_Cnt : int 2486795 2486795 2486795 2486795 2486795 2486795 2486795 2486795 2486795 2486795 ...
$ TD_Mi_Cnt_F : int 2387406 2387406 2387406 2387406 2387406 2387406 2387406 2387406 2387406 2387406 ...
$ TD_Mi_SS_Cnt : int 14 1 4 2 87 22 118 91 11 2 ...
$ TD_Mi_SS_Cnt_F : int 12 1 3 0 77 18 106 81 9 0 ...
$ TD_Mi_SSHG_Cnt : int 7 1 1 1 23 6 29 28 3 1 ...
$ TD_Mi_SSHG_Cnt_F : int 5 1 1 1 19 4 25 24 1 1 ...
$ TravelTime_Sec : num 180 37 25 190 29 288 52 76 8 189 ...
$ TT_Sec_q2 : num 10 10 10 10 10 10 10 10 10 10 ...
$ TT_Sec_q98 : num 349 349 349 349 349 349 349 349 349 349 ...
$ TT_Sec_SS_q5 : num 11.9 37 30.5 172.9 10 ...
$ TT_Sec_SS_q95 : num 346.3 37 75.8 189.1 1737.2 ...
$ TT_Sec_SSHG_q5 : num 59.6 37 25 190 11.6 236 51.5 55 8.8 189 ...
$ TT_Sec_SSHG_q95 : num 276 37 25 190 675 ...
$ TT_Sec_Mean : num 105 105 105 105 105 ...
$ TT_Sec_Mean_F : num 56.6 56.6 56.6 56.6 56.6 ...
$ TT_Sec_SS_Mean : num 215.8 37 58.2 181 585.3 ...
$ TT_Sec_SS_Mean_F : num 218.9 37 65.5 NaN 249.3 ...
$ TT_Sec_SSHG_Mean : num 202 37 25 190 257 ...
$ TT_Sec_SSHG_Mean_F : num 226 37 25 190 244 ...
$ TT_Sec_Med : num 39 39 39 39 39 39 39 39 39 39 ...
$ TT_Sec_Med_F : num 39 39 39 39 39 39 39 39 39 39 ...
$ TT_Sec_SS_Med : num 223.5 37 65.5 181 33 ...
$ TT_Sec_SS_Med_F : num 223.5 37 65.5 NA 32 ...
$ TT_Sec_SSHG_Med : num 219 37 25 190 134 286 60 65 16 189 ...
$ TT_Sec_SSHG_Med_F : num 219 37 25 190 134 286 60 65 16 189 ...
$ TT_Sec_Cnt : int 2802888 2802888 2802888 2802888 2802888 2802888 2802888 2802888 2802888 2802888 ...
$ TT_Sec_Cnt_F : int 2705189 2705189 2705189 2705189 2705189 2705189 2705189 2705189 2705189 2705189 ...
$ TT_Sec_SS_Cnt : int 14 1 4 2 173 22 141 141 11 2 ...
$ TT_Sec_SS_Cnt_F : int 12 1 2 0 156 18 127 128 9 0 ...
$ TT_Sec_SSHG_Cnt : int 7 1 1 1 35 6 36 35 3 1 ...
$ TT_Sec_SSHG_Cnt_F : int 5 1 1 1 31 4 32 32 1 1 ...
$ TravelTime_Hr : num 0.05 0.01028 0.00694 0.05278 0.00806 ...
$ TT_Hr_q2 : num 0.00278 0.00278 0.00278 0.00278 0.00278 ...
$ TT_Hr_q98 : num 0.0969 0.0969 0.0969 0.0969 0.0969 ...
$ TT_Hr_SS_q5 : num 0.00331 0.01028 0.00849 0.04803 0.00278 ...
$ TT_Hr_SS_q95 : num 0.0962 0.0103 0.0211 0.0525 0.4826 ...
$ TT_Hr_SSHG_q5 : num 0.01656 0.01028 0.00694 0.05278 0.00322 ...
$ TT_Hr_SSHG_q95 : num 0.07653 0.01028 0.00694 0.05278 0.18739 ...
$ TT_Hr_Mean : num 0.0291 0.0291 0.0291 0.0291 0.0291 ...
$ TT_Hr_Mean_F : num 0.0157 0.0157 0.0157 0.0157 0.0157 ...
$ TT_Hr_SS_Mean : num 0.0599 0.0103 0.0162 0.0503 0.1626 ...
$ TT_Hr_SS_Mean_F : num 0.0608 0.0103 0.0182 NaN 0.0692 ...
[list output truncated]
summary(TravelDistance_Mi_NoNA)
RowNum_OG group StartStop_ID BusDay_EventNum Bus_ID
Min. : 3 1:496190 Length:2486795 Min. : 2.0 Min. : 11
1st Qu.: 786568 2:497932 Class :character 1st Qu.: 115.0 1st Qu.:2923
Median :1590497 3:501611 Mode :character Median : 251.0 Median :6202
Mean :1578192 4:495069 Mean : 293.2 Mean :5431
3rd Qu.:2351264 5:495993 3rd Qu.: 431.0 3rd Qu.:7113
Max. :3119443 Max. :1344.0 Max. :8105
Route RteChange2 RouteAlt DirChange2
Length:2486795 Change: 13709 2 :994645 Change: 48443
Class :character Same :2473086 1 :943279 Same :2438352
Mode :character 3 :229032
4 :117090
5 : 67811
6 : 51391
(Other): 83547
Route_Direction Stop_Sequence Start_ID Start_Desc
SOUTH :667198 Min. : 1.00 Length:2486795 Length:2486795
NORTH :662471 1st Qu.: 12.00 Class :character Class :character
WEST :565616 Median : 24.00 Mode :character Mode :character
EAST :543386 Mean : 27.13
LOOP : 33484 3rd Qu.: 39.00
CLOCKWIS: 7012 Max. :104.00
(Other) : 7628
StopID_Clean StopID_Indicator Stop_Desc Event_Type
Length:2486795 ID_Bad: 14271 Length:2486795 Min. :3.000
Class :character ID_OK :2472524 Class :character 1st Qu.:3.000
Mode :character Mode :character Median :4.000
Mean :3.626
3rd Qu.:4.000
Max. :5.000
Event_Description Event_Time_Yr
Serviced Stop : 930934 Min. :2016
Unknown Stop : 1794 1st Qu.:2016
UnServiced Stop :1554067 Median :2016
Mean :2016
3rd Qu.:2016
Max. :2016
Event_Time_Mth Event_Time_Date Event_Time_Day Event_Time_Hr Event_Time_HrGroup
Min. :10 Min. :3.000 Sun : 0 Min. : 0.00 Group6_8 :538348
1st Qu.:10 1st Qu.:4.000 Mon :496190 1st Qu.: 8.00 Group15_17:497156
Median :10 Median :5.000 Tues :497932 Median :13.00 Group18_20:408957
Mean :10 Mean :4.999 Wed :501611 Mean :12.99 Group9_11 :351804
3rd Qu.:10 3rd Qu.:6.000 Thurs:495069 3rd Qu.:18.00 Group12_14:314050
Max. :10 Max. :7.000 Fri :495993 Max. :23.00 Group21_23:217259
Sat : 0 (Other) :159221
Event_Time_Min Event_Time Departure_Time
Min. : 0.00 Min. :2016-10-03 00:00:09 Min. :2016-10-03 00:00:09
1st Qu.:14.00 1st Qu.:2016-10-04 08:35:52 1st Qu.:2016-10-04 08:35:59
Median :29.00 Median :2016-10-05 13:46:00 Median :2016-10-05 13:46:06
Mean :29.43 Mean :2016-10-05 13:27:43 Mean :2016-10-05 13:27:49
3rd Qu.:44.00 3rd Qu.:2016-10-06 17:57:32 3rd Qu.:2016-10-06 17:57:39
Max. :59.00 Max. :2016-10-07 23:59:59 Max. :2016-10-08 00:12:31
Dwell_Time Dwell_Time2 Delta_Time Latitude
Min. : 0.00 Min. : 0.000 Min. :-5606.0 Min. : 0.00
1st Qu.: 0.00 1st Qu.: 0.000 1st Qu.: 16.0 1st Qu.:38.86
Median : 0.00 Median : 0.000 Median : 160.0 Median :38.90
Mean : 11.86 Mean : 5.994 Mean : 274.1 Mean :38.91
3rd Qu.: 4.00 3rd Qu.: 4.000 3rd Qu.: 402.0 3rd Qu.:38.96
Max. :6205.00 Max. :6205.000 Max. : 9426.0 Max. :39.19
Longitude Heading Odometer_Distance Odometer_Distance_Lag1
Min. :-77.45 Min. : 0.0 Min. : 1 Min. : 0
1st Qu.:-77.07 1st Qu.: 89.0 1st Qu.: 200268 1st Qu.: 198635
Median :-77.01 Median :180.0 Median : 394700 Median : 393026
Mean :-77.02 Mean :176.7 Mean : 443225 Mean : 441601
3rd Qu.:-76.97 3rd Qu.:269.0 3rd Qu.: 633936 3rd Qu.: 632313
Max. : 0.00 Max. :360.0 Max. :11108034 Max. :10853226
Odometer_Distance_Mi TravelDistance_Ft TravelDistance_Mi TravelDistance_Mi_Hvrs
Min. : 0.0002 Min. : 1 Min. : 0.00019 Min. : 0.0000
1st Qu.: 37.9295 1st Qu.: 699 1st Qu.: 0.13239 1st Qu.: 0.1034
Median : 74.7538 Median : 1044 Median : 0.19773 Median : 0.1378
Mean : 83.9442 Mean : 1624 Mean : 0.30760 Mean : 0.1918
3rd Qu.: 120.0635 3rd Qu.: 1518 3rd Qu.: 0.28750 3rd Qu.: 0.1828
Max. :2103.7943 Max. :1323464 Max. :250.65606 Max. :24.1507
TD_Mi_q2 TD_Mi_q98 TD_Mi_SS_q5 TD_Mi_SS_q95
Min. :0.05208 Min. :0.9585 Min. : 0.00019 Min. : 0.00019
1st Qu.:0.05208 1st Qu.:0.9585 1st Qu.: 0.08848 1st Qu.: 0.25878
Median :0.05208 Median :0.9585 Median : 0.10608 Median : 0.32239
Mean :0.05208 Mean :0.9585 Mean : 0.16872 Mean : 0.47949
3rd Qu.:0.05208 3rd Qu.:0.9585 3rd Qu.: 0.13977 3rd Qu.: 0.42822
Max. :0.05208 Max. :0.9585 Max. :219.16288 Max. :246.94938
TD_Mi_SSHG_q5 TD_Mi_SSHG_q95 TD_Mi_Mean TD_Mi_Mean_F
Min. : 0.00019 Min. : 0.00019 Min. :0.3076 Min. :0.2318
1st Qu.: 0.09167 1st Qu.: 0.24754 1st Qu.:0.3076 1st Qu.:0.2318
Median : 0.11395 Median : 0.31174 Median :0.3076 Median :0.2318
Mean : 0.18528 Mean : 0.46625 Mean :0.3076 Mean :0.2318
3rd Qu.: 0.15093 3rd Qu.: 0.41899 3rd Qu.:0.3076 3rd Qu.:0.2318
Max. :250.65606 Max. :250.65606 Max. :0.3076 Max. :0.2318
TD_Mi_SS_Mean TD_Mi_SS_Mean_F TD_Mi_SSHG_Mean TD_Mi_SSHG_Mean_F
Min. : 0.00019 Min. : 0.0002 Min. : 0.00019 Min. : 0.000
1st Qu.: 0.17129 1st Qu.: 0.1663 1st Qu.: 0.16760 1st Qu.: 0.163
Median : 0.21082 Median : 0.2058 Median : 0.20965 Median : 0.206
Mean : 0.30760 Mean : 0.2916 Mean : 0.30760 Mean : 0.294
3rd Qu.: 0.26422 3rd Qu.: 0.2582 3rd Qu.: 0.26616 3rd Qu.: 0.262
Max. :219.16288 Max. :219.1629 Max. :250.65606 Max. :250.656
NA's :2678 NA's :4904
TD_Mi_Med TD_Mi_Med_F TD_Mi_SS_Med TD_Mi_SS_Med_F
Min. :0.1977 Min. :0.1977 Min. : 0.00019 Min. : 0.0002
1st Qu.:0.1977 1st Qu.:0.1977 1st Qu.: 0.14602 1st Qu.: 0.1458
Median :0.1977 Median :0.1977 Median : 0.19470 Median : 0.1947
Mean :0.1977 Mean :0.1977 Mean : 0.28931 Mean : 0.2827
3rd Qu.:0.1977 3rd Qu.:0.1977 3rd Qu.: 0.26326 3rd Qu.: 0.2633
Max. :0.1977 Max. :0.1977 Max. :219.16288 Max. :219.1629
NA's :2678
TD_Mi_SSHG_Med TD_Mi_SSHG_Med_F TD_Mi_Cnt TD_Mi_Cnt_F
Min. : 0.00019 Min. : 0.000 Min. :2486795 Min. :2387406
1st Qu.: 0.14403 1st Qu.: 0.144 1st Qu.:2486795 1st Qu.:2387406
Median : 0.19527 Median : 0.195 Median :2486795 Median :2387406
Mean : 0.29152 Mean : 0.285 Mean :2486795 Mean :2387406
3rd Qu.: 0.26657 3rd Qu.: 0.266 3rd Qu.:2486795 3rd Qu.:2387406
Max. :250.65606 Max. :250.656 Max. :2486795 Max. :2387406
NA's :4904
TD_Mi_SS_Cnt TD_Mi_SS_Cnt_F TD_Mi_SSHG_Cnt TD_Mi_SSHG_Cnt_F
Min. : 1.0 Min. : 0.0 Min. : 1.00 Min. : 0.00
1st Qu.: 178.0 1st Qu.: 160.0 1st Qu.: 28.00 1st Qu.: 24.00
Median : 295.0 Median : 266.0 Median : 48.00 Median : 42.00
Mean : 363.3 Mean : 327.1 Mean : 60.01 Mean : 53.31
3rd Qu.: 476.0 3rd Qu.: 428.0 3rd Qu.: 78.00 3rd Qu.: 70.00
Max. :1543.0 Max. :1388.0 Max. :663.00 Max. :595.00
TravelTime_Sec TT_Sec_q2 TT_Sec_q98 TT_Sec_SS_q5 TT_Sec_SS_q95
Min. : 1 Min. :10 Min. :349 Min. : 1.00 Min. : 1.00
1st Qu.: 24 1st Qu.:10 1st Qu.:349 1st Qu.: 15.00 1st Qu.: 47.00
Median : 38 Median :10 Median :349 Median : 21.00 Median : 77.75
Mean : 100 Mean :10 Mean :349 Mean : 57.38 Mean : 176.22
3rd Qu.: 70 3rd Qu.:10 3rd Qu.:349 3rd Qu.: 32.00 3rd Qu.: 129.65
Max. :54551 Max. :10 Max. :349 Max. :54551.00 Max. :54551.00
NA's :28
TT_Sec_SSHG_q5 TT_Sec_SSHG_q95 TT_Sec_Mean TT_Sec_Mean_F
Min. : 1.00 Min. : 1.00 Min. :104.9 Min. :56.61
1st Qu.: 15.20 1st Qu.: 42.70 1st Qu.:104.9 1st Qu.:56.61
Median : 22.50 Median : 70.55 Median :104.9 Median :56.61
Mean : 62.94 Mean : 161.25 Mean :104.9 Mean :56.61
3rd Qu.: 34.80 3rd Qu.: 119.60 3rd Qu.:104.9 3rd Qu.:56.61
Max. :54551.00 Max. :54551.00 Max. :104.9 Max. :56.61
TT_Sec_SS_Mean TT_Sec_SS_Mean_F TT_Sec_SSHG_Mean TT_Sec_SSHG_Mean_F
Min. : 1.00 Min. : 1.00 Min. : 1.00 Min. : 1.00
1st Qu.: 28.20 1st Qu.: 26.62 1st Qu.: 27.51 1st Qu.: 26.34
Median : 42.61 Median : 40.46 Median : 41.76 Median : 39.96
Mean : 99.62 Mean : 86.96 Mean : 99.55 Mean : 88.81
3rd Qu.: 69.71 3rd Qu.: 66.44 3rd Qu.: 70.02 3rd Qu.: 67.22
Max. :54551.00 Max. :54551.00 Max. :54551.00 Max. :54551.00
NA's :2603 NA's :3772
TT_Sec_Med TT_Sec_Med_F TT_Sec_SS_Med TT_Sec_SS_Med_F
Min. :39 Min. :39 Min. : 1.00 Min. : 1.00
1st Qu.:39 1st Qu.:39 1st Qu.: 25.00 1st Qu.: 25.00
Median :39 Median :39 Median : 37.00 Median : 37.00
Mean :39 Mean :39 Mean : 86.88 Mean : 80.62
3rd Qu.:39 3rd Qu.:39 3rd Qu.: 62.00 3rd Qu.: 62.00
Max. :39 Max. :39 Max. :54551.00 Max. :54551.00
NA's :2603
TT_Sec_SSHG_Med TT_Sec_SSHG_Med_F TT_Sec_Cnt TT_Sec_Cnt_F
Min. : 1.00 Min. : 1.00 Min. :2802888 Min. :2705189
1st Qu.: 25.00 1st Qu.: 25.00 1st Qu.:2802888 1st Qu.:2705189
Median : 37.00 Median : 37.00 Median :2802888 Median :2705189
Mean : 90.07 Mean : 83.87 Mean :2802888 Mean :2705189
3rd Qu.: 64.00 3rd Qu.: 64.00 3rd Qu.:2802888 3rd Qu.:2705189
Max. :54551.00 Max. :54551.00 Max. :2802888 Max. :2705189
NA's :3772
TT_Sec_SS_Cnt TT_Sec_SS_Cnt_F TT_Sec_SSHG_Cnt TT_Sec_SSHG_Cnt_F
Min. : 1.0 Min. : 0.0 Min. : 1.0 Min. : 0.00
1st Qu.: 200.0 1st Qu.: 183.0 1st Qu.: 30.0 1st Qu.: 27.00
Median : 321.0 Median : 292.0 Median : 52.0 Median : 47.00
Mean : 392.4 Mean : 357.2 Mean : 64.7 Mean : 58.23
3rd Qu.: 509.0 3rd Qu.: 464.0 3rd Qu.: 84.0 3rd Qu.: 76.00
Max. :1664.0 Max. :1523.0 Max. :691.0 Max. :634.00
TravelTime_Hr TT_Hr_q2 TT_Hr_q98 TT_Hr_SS_q5
Min. : 0.000278 Min. :0.002778 Min. :0.09694 Min. : 0.000278
1st Qu.: 0.006667 1st Qu.:0.002778 1st Qu.:0.09694 1st Qu.: 0.004167
Median : 0.010556 Median :0.002778 Median :0.09694 Median : 0.005833
Mean : 0.027782 Mean :0.002778 Mean :0.09694 Mean : 0.015938
3rd Qu.: 0.019444 3rd Qu.:0.002778 3rd Qu.:0.09694 3rd Qu.: 0.008889
Max. :15.153056 Max. :0.002778 Max. :0.09694 Max. :15.153056
NA's :28
TT_Hr_SS_q95 TT_Hr_SSHG_q5 TT_Hr_SSHG_q95 TT_Hr_Mean
Min. : 0.000278 Min. : 0.000278 Min. : 0.000278 Min. :0.02913
1st Qu.: 0.013056 1st Qu.: 0.004222 1st Qu.: 0.011861 1st Qu.:0.02913
Median : 0.021597 Median : 0.006250 Median : 0.019597 Median :0.02913
Mean : 0.048950 Mean : 0.017485 Mean : 0.044792 Mean :0.02913
3rd Qu.: 0.036014 3rd Qu.: 0.009667 3rd Qu.: 0.033222 3rd Qu.:0.02913
Max. :15.153056 Max. :15.153056 Max. :15.153056 Max. :0.02913
TT_Hr_Mean_F TT_Hr_SS_Mean TT_Hr_SS_Mean_F TT_Hr_SSHG_Mean
Min. :0.01573 Min. : 0.000278 Min. : 0.0003 Min. : 0.000278
1st Qu.:0.01573 1st Qu.: 0.007832 1st Qu.: 0.0074 1st Qu.: 0.007643
Median :0.01573 Median : 0.011836 Median : 0.0112 Median : 0.011600
Mean :0.01573 Mean : 0.027673 Mean : 0.0242 Mean : 0.027654
3rd Qu.:0.01573 3rd Qu.: 0.019363 3rd Qu.: 0.0185 3rd Qu.: 0.019450
Max. :0.01573 Max. :15.153056 Max. :15.1531 Max. :15.153056
NA's :2612
TT_Hr_SSHG_Mean_F TT_Hr_Med TT_Hr_Med_F TT_Hr_SS_Med
Min. : 0.000 Min. :0.01083 Min. :0.01083 Min. : 0.000278
1st Qu.: 0.007 1st Qu.:0.01083 1st Qu.:0.01083 1st Qu.: 0.006944
Median : 0.011 Median :0.01083 Median :0.01083 Median : 0.010278
Mean : 0.025 Mean :0.01083 Mean :0.01083 Mean : 0.024132
3rd Qu.: 0.019 3rd Qu.:0.01083 3rd Qu.:0.01083 3rd Qu.: 0.017222
Max. :15.153 Max. :0.01083 Max. :0.01083 Max. :15.153056
NA's :3842
TT_Hr_SS_Med_F TT_Hr_SSHG_Med TT_Hr_SSHG_Med_F TT_Hr_Cnt
Min. : 0.0003 Min. : 0.000278 Min. : 0.000 Min. :2802888
1st Qu.: 0.0069 1st Qu.: 0.006944 1st Qu.: 0.007 1st Qu.:2802888
Median : 0.0103 Median : 0.010278 Median : 0.010 Median :2802888
Mean : 0.0224 Mean : 0.025019 Mean : 0.023 Mean :2802888
3rd Qu.: 0.0172 3rd Qu.: 0.017778 3rd Qu.: 0.018 3rd Qu.:2802888
Max. :15.1531 Max. :15.153056 Max. :15.153 Max. :2802888
NA's :2612 NA's :3842
TT_Hr_Cnt_F TT_Hr_SS_Cnt TT_Hr_SS_Cnt_F TT_Hr_SSHG_Cnt TT_Hr_SSHG_Cnt_F
Min. :2705189 Min. : 1.0 Min. : 0 Min. : 1.0 Min. : 0.00
1st Qu.:2705189 1st Qu.: 200.0 1st Qu.: 183 1st Qu.: 30.0 1st Qu.: 27.00
Median :2705189 Median : 321.0 Median : 292 Median : 52.0 Median : 47.00
Mean :2705189 Mean : 392.4 Mean : 357 Mean : 64.7 Mean : 58.19
3rd Qu.:2705189 3rd Qu.: 509.0 3rd Qu.: 464 3rd Qu.: 84.0 3rd Qu.: 76.00
Max. :2705189 Max. :1664.0 Max. :1523 Max. :691.0 Max. :634.00
SpeedAvg_Mph
Min. : 0.00
1st Qu.: 10.10
Median : 16.68
Mean : 26.54
3rd Qu.: 31.17
Max. :22924.09
NA's :28
Investigation of TravelDistance_Mi.
Let’s plot just the TravelDistance_Mi values that are NOT “NA”.
TravDistMi_HistDen <- ggplot(select(TravelDistance_Mi_NoNA,
TravelDistance_Mi
),
aes(x = TravelDistance_Mi,
y = ..density..
)
) +
geom_histogram(binwidth = 0.05, fill = "lightblue", colour = "grey60", size = 0.2) +
geom_line(stat = "density", colour = "red") +
coord_cartesian(xlim = c(0, 1.5), ylim = c(0, 4.0)
) +
labs(title = "Variation in Distance Between Stops",
x = "Travel Distance (miles)",
y = "Density"
)
TravDistMi_HistDen
Investigation of TravelDistance_Mi.
Looking at the extremely large TravelDistance_Mi values. Some (aprox 27%) of TravelDistance_Mi values > 1 mile are when the DirChange2 changes…but what about the other ~73%?
rm(TravelDistance_Mi_NoNA)
# examples of weirdly large TravelDistance_Mi
View(filter(AllDays_NewOrder,
TravelDistance_Mi > 1.1587121212 # 1.1587121212 is the 99th percentile
) %>%
arrange(desc(TravelDistance_Mi)
)
)
# Why are these extremes? Airports? Bus collection points?
View(filter(AllDays_NewOrder,
between(RowNum_OG, 494044, 494064) | # 494054
between(RowNum_OG, 494273, 494293) | # 494283
between(RowNum_OG, 494626, 494646) | # 494636
between(RowNum_OG, 1610156, 1610176) | # 1610166
between(RowNum_OG, 2073074, 2073094) # 2073084
)
)
# Before Removing Runs
# View(filter(AllDays_Sorted,
# between(RowNum_OG, 494044, 494064) | # 494054
# between(RowNum_OG, 494273, 494293) | # 494283
# between(RowNum_OG, 494626, 494646) | # 494636
# between(RowNum_OG, 1610156, 1610176) | # 1610166
# between(RowNum_OG, 2073074, 2073094) # 2073084
# )
# )
# After Removing Runs
# View(filter(AllDays_FirstStopID,
# between(RowNum_OG, 494044, 494064) | # 494054
# between(RowNum_OG, 494273, 494293) | # 494283
# between(RowNum_OG, 494626, 494646) | # 494636
# between(RowNum_OG, 1610156, 1610176) | # 1610166
# between(RowNum_OG, 2073074, 2073094) # 2073084
# )
# )
Investigation of TravelDistance_Mi.
Any relation with DirChange2? Doesn’t look as if this is so.
ExtremeTravDist <- filter(AllDays_NewOrder,
!is.na(TravelDistance_Mi)
) %>%
mutate(TravDist_Extreme = ifelse(TravelDistance_Mi > 1.1587121212, # 1.1587121212 is the 99th percentile
"True",
"False"
)
) %>%
group_by(DirChange2, TravDist_Extreme) %>%
summarise(TravDistMI_ExtCnts = n()
)
# ExtremeTravDist
ExtremeTravDist_Spread <- as.data.frame(spread(ExtremeTravDist,
TravDist_Extreme,
TravDistMI_ExtCnts
)
) %>%
select(False,
True
)
row.names(ExtremeTravDist_Spread) <- c("Change", "Same")
# str(ExtremeTravDist_Spread)
# ExtremeTravDist_Spread
prop.table(as.table(as.matrix(ExtremeTravDist_Spread)
),
1
)
False True
Change 0.80622587 0.19377413
Same 0.98855456 0.01144544
prop.table(as.table(as.matrix(ExtremeTravDist_Spread)
),
2
)
False True
Change 0.01594448 0.25169594
Same 0.98405552 0.74830406
Investigation of TravelDistance_Mi.
Looking at specific buses and StartStop_ID.
rm(ExtremeTravDist, ExtremeTravDist_Spread)
View(arrange(group_by(AllDays_NewOrder,
Bus_ID
) %>%
summarise(DistTrav_Mean = mean(TravelDistance_Mi, na.rm = TRUE),
DistTrav_Med = median(TravelDistance_Mi, na.rm = TRUE)
),
desc(DistTrav_Med)
)
)
# example of extremely small TravelDistance_Mi values (looks like the odometer wasn't functioning)
View(filter(AllDays_NewOrder,
Bus_ID == 6111 |
Bus_ID == 7201 |
Bus_ID == 8058
) %>%
arrange(Bus_ID, Event_Time)
)
View(arrange(group_by(AllDays_NewOrder,
StartStop_ID
) %>%
summarise(DistTrav_Mean = mean(TravelDistance_Mi, na.rm = TRUE),
DistTrav_Med = median(TravelDistance_Mi, na.rm = TRUE)
),
desc(DistTrav_Med)
)
)
# example of extremely large TravelDistance_Mi values...no idea why...
View(filter(AllDays_NewOrder,
StartStop_ID == "1003665--12" |
StartStop_ID == "1003665--5001925" |
StartStop_ID == "3001038--3002565"
) %>%
arrange(StartStop_ID, Event_Time)
)
Investigation of TravelDistance_Mi & TravelDistance_Mi_New.
If TravelDisntace_Mi is below the 5th percentile for that StartStop_ID, or if TravelDisntace_Mi is above the 95th percentile for that StartStop_ID, or if TravelDistance_Mi is NA (when the BusDay_EventNum !=1), consider this an outlier. In this case, replace the value with the mean for that StartStop_ID and HourGroup (TD_Mi_SSHG_Mean_F), or if there are not enough values at the HourGroup level, replace it with the mean for that StartStop_ID.
# View(tail(AllDays_NewOrder, 500))
AllDays_NewTravelDist <-
mutate(AllDays_NewOrder,
TravelDistance_Mi_New = ifelse(!is.na(TravelDistance_Mi) &
(TravelDistance_Mi < TD_Mi_SSHG_q5 |
TravelDistance_Mi > TD_Mi_SSHG_q95
) &
TD_Mi_SSHG_Cnt_F >= 20,
TD_Mi_SSHG_Mean_F,
ifelse(!is.na(TravelDistance_Mi) &
(TravelDistance_Mi < TD_Mi_SSHG_q5 |
TravelDistance_Mi > TD_Mi_SSHG_q95
) &
TD_Mi_SSHG_Cnt_F < 20 &
TD_Mi_SS_Cnt_F >= 20,
TD_Mi_SS_Mean_F,
ifelse(!is.na(TravelDistance_Mi) &
(TravelDistance_Mi < TD_Mi_SSHG_q5 |
TravelDistance_Mi > TD_Mi_SSHG_q95
) &
TD_Mi_SS_Cnt_F < 20 &
TD_Mi_SS_Cnt >= 20,
TD_Mi_SS_Mean,
ifelse(is.na(TravelDistance_Mi) &
BusDay_EventNum != 1 &
TravelDistance_Mi_Hvrs != 0,
TravelDistance_Mi_Hvrs,
ifelse(is.na(TravelDistance_Mi) &
BusDay_EventNum != 1 &
TravelDistance_Mi_Hvrs == 0,
TD_Mi_SS_Mean,
TravelDistance_Mi
))))),
TravelDistance_Mi_New_Label =
factor(ifelse(!is.na(TravelDistance_Mi) &
(TravelDistance_Mi < TD_Mi_SSHG_q5 |
TravelDistance_Mi > TD_Mi_SSHG_q95
) &
TD_Mi_SSHG_Cnt_F >= 20,
"TD_Mi_SSHG_Mean_F",
ifelse(!is.na(TravelDistance_Mi) &
(TravelDistance_Mi < TD_Mi_SSHG_q5 |
TravelDistance_Mi > TD_Mi_SSHG_q95
) &
TD_Mi_SSHG_Cnt_F < 20 &
TD_Mi_SS_Cnt_F >= 20,
"TD_Mi_SS_Mean_F",
ifelse(!is.na(TravelDistance_Mi) &
(TravelDistance_Mi < TD_Mi_SSHG_q5 |
TravelDistance_Mi > TD_Mi_SSHG_q95
) &
TD_Mi_SS_Cnt_F < 20 &
TD_Mi_SS_Cnt >= 20,
"TD_Mi_SS_Mean",
ifelse(is.na(TravelDistance_Mi) &
BusDay_EventNum != 1 &
TravelDistance_Mi_Hvrs != 0,
"TravelDistance_Mi_Hvrs",
ifelse(is.na(TravelDistance_Mi) &
BusDay_EventNum != 1 &
TravelDistance_Mi_Hvrs == 0,
"TD_Mi_SS_Mean",
"TravelDistance_Mi"
)))))
),
TravelDistance_Mi_NewHvrs = ifelse(!is.na(TravelDistance_Mi_Hvrs) &
TravelDistance_Mi_Hvrs != 0 &
(TravelDistance_Mi_New < TD_Mi_q2 |
TravelDistance_Mi_New > TD_Mi_q98
),
TravelDistance_Mi_Hvrs,
TravelDistance_Mi_New
),
TravelDistance_Mi_NewHvrs_Label =
factor(ifelse(!is.na(TravelDistance_Mi_Hvrs) &
TravelDistance_Mi_Hvrs != 0 &
(TravelDistance_Mi_New < TD_Mi_q2 |
TravelDistance_Mi_New > TD_Mi_q98
),
"TravelDistance_Mi_Hvrs",
as.character(TravelDistance_Mi_New_Label)
)
),
SpeedAvg_Mph_NewHvrs = TravelDistance_Mi_NewHvrs / TravelTime_Hr
)
str(AllDays_NewTravelDist)
'data.frame': 2809529 obs. of 119 variables:
$ RowNum_OG : int 1 3 4 5 6 7 9 10 11 12 ...
$ group : Factor w/ 5 levels "1","2","3","4",..: 1 1 1 1 1 1 1 1 1 1 ...
$ StartStop_ID : chr "NULL--5004572" "5004572--5004573" "5004573--5002210" "5002210--5002209" ...
$ BusDay_EventNum : int 1 2 3 4 5 6 7 8 9 10 ...
$ Bus_ID : int 11 11 11 11 11 11 11 11 11 11 ...
$ Route : chr "S80" "S80" "S80" "S80" ...
$ RteChange2 : Factor w/ 2 levels "Change","Same": 1 2 2 2 2 2 2 2 2 2 ...
$ RouteAlt : Factor w/ 14 levels "1","10","11",..: 1 1 1 1 1 1 6 6 6 6 ...
$ DirChange2 : Factor w/ 2 levels "Change","Same": 1 2 2 2 2 2 1 2 2 2 ...
$ Route_Direction : Factor w/ 12 levels "","ANTICLKW",..: 6 6 6 6 6 6 6 6 6 6 ...
$ Stop_Sequence : int 7 6 3 2 8 1 2 3 4 2 ...
$ Start_ID : chr NA "5004572" "5004573" "5002210" ...
$ Start_Desc : chr NA "BEULAH ST + CHARLES ARRINGTON DR" "WALKER LN + #6363" "WALKER LN + BEULAH ST" ...
$ StopID_Clean : chr "5004572" "5004573" "5002210" "5002209" ...
$ StopID_Indicator : Factor w/ 2 levels "ID_Bad","ID_OK": 2 2 2 2 2 2 2 2 2 2 ...
$ Stop_Desc : chr "BEULAH ST + CHARLES ARRINGTON DR" "WALKER LN + #6363" "WALKER LN + BEULAH ST" "BEULAH ST + CHARLES ARRINGTON DR" ...
$ Event_Type : int 4 4 4 4 3 3 4 4 4 4 ...
$ Event_Description : Factor w/ 3 levels "Serviced Stop ",..: 3 3 3 3 1 1 3 3 3 3 ...
$ Event_Time_Yr : int 2016 2016 2016 2016 2016 2016 2016 2016 2016 2016 ...
$ Event_Time_Mth : int 10 10 10 10 10 10 10 10 10 10 ...
$ Event_Time_Date : int 3 3 3 3 3 3 3 3 3 3 ...
$ Event_Time_Day : Ord.factor w/ 7 levels "Sun"<"Mon"<"Tues"<..: 2 2 2 2 2 2 2 2 2 2 ...
$ Event_Time_Hr : int 6 6 6 6 6 6 6 6 6 6 ...
$ Event_Time_HrGroup : Ord.factor w/ 8 levels "Group0_2"<"Group3_5"<..: 3 3 3 3 3 3 3 3 3 3 ...
$ Event_Time_Min : int 6 9 10 10 13 14 21 21 23 23 ...
$ Event_Time : POSIXct, format: "2016-10-03 06:06:47" "2016-10-03 06:09:47" ...
$ Departure_Time : POSIXct, format: "2016-10-03 06:06:47" "2016-10-03 06:09:47" ...
$ Dwell_Time : int 0 0 0 0 0 104 0 0 0 0 ...
$ Dwell_Time2 : num 0 0 0 0 0 104 0 0 0 0 ...
$ Delta_Time : int -177 24 165 25 73 719 74 76 63 69 ...
$ Latitude : num 38.8 38.8 38.8 38.8 38.8 ...
$ Longitude : num -77.2 -77.2 -77.2 -77.2 -77.2 ...
$ Heading : int 199 97 276 15 119 100 274 104 241 274 ...
$ Odometer_Distance : int 43543 45139 46418 50115 51074 51303 55633 56163 56285 57262 ...
$ Odometer_Distance_Lag1 : int NA 43543 45139 46418 50115 51074 51303 55633 56163 56285 ...
$ Odometer_Distance_Mi : num 8.25 8.55 8.79 9.49 9.67 ...
$ TravelDistance_Ft : int NA 1596 1279 3697 959 229 4330 530 122 977 ...
$ TravelDistance_Mi : num NA 0.302 0.242 0.7 0.182 ...
$ TravelDistance_Mi_Hvrs : num NA 0.15 0.105 0.165 0.832 ...
$ TD_Mi_q2 : num 0.0521 0.0521 0.0521 0.0521 0.0521 ...
$ TD_Mi_q98 : num 0.959 0.959 0.959 0.959 0.959 ...
$ TD_Mi_SS_q5 : num NA 0.0252 0.2422 0.7324 0.0794 ...
$ TD_Mi_SS_q95 : num NA 0.626 0.242 1.008 0.176 ...
$ TD_Mi_SSHG_q5 : num NA 0.0996 0.2422 0.7002 0.1816 ...
$ TD_Mi_SSHG_q95 : num NA 0.627 0.242 0.7 0.182 ...
$ TD_Mi_Mean : num 0.308 0.308 0.308 0.308 0.308 ...
$ TD_Mi_Mean_F : num 0.232 0.232 0.232 0.232 0.232 ...
$ TD_Mi_SS_Mean : num NaN 0.437 0.242 0.908 0.128 ...
$ TD_Mi_SS_Mean_F : num NaN 0.457 0.242 0.977 NaN ...
$ TD_Mi_SSHG_Mean : num NaN 0.442 0.242 0.7 0.182 ...
$ TD_Mi_SSHG_Mean_F : num NaN 0.491 0.242 0.7 0.182 ...
$ TD_Mi_Med : num 0.198 0.198 0.198 0.198 0.198 ...
$ TD_Mi_Med_F : num 0.198 0.198 0.198 0.198 0.198 ...
$ TD_Mi_SS_Med : num NA 0.512 0.242 0.962 0.128 ...
$ TD_Mi_SS_Med_F : num NA 0.512 0.242 1.008 NA ...
$ TD_Mi_SSHG_Med : num NA 0.512 0.242 0.7 0.182 ...
$ TD_Mi_SSHG_Med_F : num NA 0.512 0.242 0.7 0.182 ...
$ TD_Mi_Cnt : int 2486795 2486795 2486795 2486795 2486795 2486795 2486795 2486795 2486795 2486795 ...
$ TD_Mi_Cnt_F : int 2387406 2387406 2387406 2387406 2387406 2387406 2387406 2387406 2387406 2387406 ...
$ TD_Mi_SS_Cnt : int 0 14 1 4 2 87 22 118 91 11 ...
$ TD_Mi_SS_Cnt_F : int 0 12 1 3 0 77 18 106 81 9 ...
$ TD_Mi_SSHG_Cnt : int 0 7 1 1 1 23 6 29 28 3 ...
$ TD_Mi_SSHG_Cnt_F : int 0 5 1 1 1 19 4 25 24 1 ...
$ TravelTime_Sec : num NA 180 37 25 190 29 288 52 76 8 ...
$ TT_Sec_q2 : num 10 10 10 10 10 10 10 10 10 10 ...
$ TT_Sec_q98 : num 349 349 349 349 349 349 349 349 349 349 ...
$ TT_Sec_SS_q5 : num NA 11.9 37 30.5 172.9 ...
$ TT_Sec_SS_q95 : num NA 346.3 37 75.8 189.1 ...
$ TT_Sec_SSHG_q5 : num NA 59.6 37 25 190 11.6 236 51.5 55 8.8 ...
$ TT_Sec_SSHG_q95 : num NA 276 37 25 190 ...
$ TT_Sec_Mean : num 105 105 105 105 105 ...
$ TT_Sec_Mean_F : num 56.6 56.6 56.6 56.6 56.6 ...
$ TT_Sec_SS_Mean : num NaN 215.8 37 58.2 181 ...
$ TT_Sec_SS_Mean_F : num NaN 218.9 37 65.5 NaN ...
$ TT_Sec_SSHG_Mean : num NaN 202 37 25 190 ...
$ TT_Sec_SSHG_Mean_F : num NaN 226 37 25 190 ...
$ TT_Sec_Med : num 39 39 39 39 39 39 39 39 39 39 ...
$ TT_Sec_Med_F : num 39 39 39 39 39 39 39 39 39 39 ...
$ TT_Sec_SS_Med : num NA 223.5 37 65.5 181 ...
$ TT_Sec_SS_Med_F : num NA 223.5 37 65.5 NA ...
$ TT_Sec_SSHG_Med : num NA 219 37 25 190 134 286 60 65 16 ...
$ TT_Sec_SSHG_Med_F : num NA 219 37 25 190 134 286 60 65 16 ...
$ TT_Sec_Cnt : int 2802888 2802888 2802888 2802888 2802888 2802888 2802888 2802888 2802888 2802888 ...
$ TT_Sec_Cnt_F : int 2705189 2705189 2705189 2705189 2705189 2705189 2705189 2705189 2705189 2705189 ...
$ TT_Sec_SS_Cnt : int 0 14 1 4 2 173 22 141 141 11 ...
$ TT_Sec_SS_Cnt_F : int 0 12 1 2 0 156 18 127 128 9 ...
$ TT_Sec_SSHG_Cnt : int 0 7 1 1 1 35 6 36 35 3 ...
$ TT_Sec_SSHG_Cnt_F : int 0 5 1 1 1 31 4 32 32 1 ...
$ TravelTime_Hr : num NA 0.05 0.01028 0.00694 0.05278 ...
$ TT_Hr_q2 : num 0.00278 0.00278 0.00278 0.00278 0.00278 ...
$ TT_Hr_q98 : num 0.0969 0.0969 0.0969 0.0969 0.0969 ...
$ TT_Hr_SS_q5 : num NA 0.00331 0.01028 0.00849 0.04803 ...
$ TT_Hr_SS_q95 : num NA 0.0962 0.0103 0.0211 0.0525 ...
$ TT_Hr_SSHG_q5 : num NA 0.01656 0.01028 0.00694 0.05278 ...
$ TT_Hr_SSHG_q95 : num NA 0.07653 0.01028 0.00694 0.05278 ...
$ TT_Hr_Mean : num 0.0291 0.0291 0.0291 0.0291 0.0291 ...
$ TT_Hr_Mean_F : num 0.0157 0.0157 0.0157 0.0157 0.0157 ...
$ TT_Hr_SS_Mean : num NaN 0.0599 0.0103 0.0162 0.0503 ...
$ TT_Hr_SS_Mean_F : num NaN 0.0608 0.0103 0.0182 NaN ...
[list output truncated]
Investigation of TravelDistance_Mi & TravelDistance_Mi_Hvrs & TravelDistance_Mi_New.
Quick summary and then correlation calculation.
rm(AllDays_NewOrder)
# 38 rows meet this criteria anymore -- appears to be the case when both the Lat Long calculations, and the TravelDistance calculations did not function properly.
View(filter(AllDays_NewTravelDist,
is.na(TravelDistance_Mi_New) &
BusDay_EventNum != 1
)
)
View(AllDays_NewTravelDist %>%
arrange(desc(TravelDistance_Mi_New)) %>%
head(500)
)
summary(select(AllDays_NewTravelDist,
TravelDistance_Mi,
TravelDistance_Mi_Hvrs,
TravelDistance_Mi_New,
TravelDistance_Mi_NewHvrs
)
)
TravelDistance_Mi TravelDistance_Mi_Hvrs TravelDistance_Mi_New
Min. : 0.0 Min. : 0.000 Min. : 0.000
1st Qu.: 0.1 1st Qu.: 0.106 1st Qu.: 0.141
Median : 0.2 Median : 0.142 Median : 0.199
Mean : 0.3 Mean : 0.201 Mean : 0.298
3rd Qu.: 0.3 3rd Qu.: 0.193 3rd Qu.: 0.276
Max. :250.7 Max. :24.407 Max. :250.656
NA's :322734 NA's :6528 NA's :6566
TravelDistance_Mi_NewHvrs
Min. : 0.000
1st Qu.: 0.142
Median : 0.199
Mean : 0.259
3rd Qu.: 0.276
Max. :36.236
NA's :6566
summary(select(filter(AllDays_NewTravelDist,
BusDay_EventNum != 1
),
TravelDistance_Mi,
TravelDistance_Mi_Hvrs,
TravelDistance_Mi_New,
TravelDistance_Mi_NewHvrs
)
)
TravelDistance_Mi TravelDistance_Mi_Hvrs TravelDistance_Mi_New
Min. : 0.00 Min. : 0.0000 Min. : 0.00019
1st Qu.: 0.13 1st Qu.: 0.1055 1st Qu.: 0.14072
Median : 0.20 Median : 0.1424 Median : 0.19867
Mean : 0.31 Mean : 0.2008 Mean : 0.29751
3rd Qu.: 0.29 3rd Qu.: 0.1935 3rd Qu.: 0.27633
Max. :250.66 Max. :24.4068 Max. :250.65606
NA's :316206 NA's :38
TravelDistance_Mi_NewHvrs
Min. : 0.00019
1st Qu.: 0.14205
Median : 0.19903
Mean : 0.25859
3rd Qu.: 0.27557
Max. :36.23636
NA's :38
cor(select(AllDays_NewTravelDist,
TravelDistance_Mi,
TravelDistance_Mi_Hvrs,
TravelDistance_Mi_New,
TravelDistance_Mi_NewHvrs
),
use = "pairwise.complete.obs"
)
TravelDistance_Mi TravelDistance_Mi_Hvrs
TravelDistance_Mi 1.0000000 0.5447660
TravelDistance_Mi_Hvrs 0.5447660 1.0000000
TravelDistance_Mi_New 0.9513379 0.5837182
TravelDistance_Mi_NewHvrs 0.6005944 0.9005277
TravelDistance_Mi_New TravelDistance_Mi_NewHvrs
TravelDistance_Mi 0.9513379 0.6005944
TravelDistance_Mi_Hvrs 0.5837182 0.9005277
TravelDistance_Mi_New 1.0000000 0.6346981
TravelDistance_Mi_NewHvrs 0.6346981 1.0000000
Investigation of TravelDistance_Mi_NewHvrs_Label & TravelDistance_Mi_NewHvrs_Label.
Show how the labels changed.
group_by(AllDays_NewTravelDist,
TravelDistance_Mi_New_Label,
TravelDistance_Mi_NewHvrs_Label
) %>%
summarise(CntNum = n(),
CntPct = format(CntNum / nrow(AllDays_NewTravelDist),
scientific = 9999
)
) %>%
arrange(desc(CntPct)
)
Investigation of TravelDistance_Mi & TravelDistance_Mi_Hvrs & TravelDistance_Mi_New.
Graphing the two methods of calculating TravelDistance_Mi.
First, let’s get create a function to plot the liner model equation.
lm_eqn <- function(df, y, x){
m <- lm(y ~ x, df)
l <- list(a = format(coef(m)[1], digits = 2),
b = format(abs(coef(m)[2]), digits = 2),
s1 = ifelse(test = coef(m)[2] > 0,
yes = "+",
no = "-"
),
r2 = format(summary(m)$r.squared,
digits = 3
)
)
eq <- substitute(italic(y) == a~~s1~~b %.% italic(x)*","~~italic(r)^2~"="~r2,
l
)
as.character(as.expression(eq)
)
}
Investigation of TravelDistance_Mi & TravelDistance_Mi_NewHvrs.
Scatter plot (using a 10% sample to making plotting time faster and to reduce un-needed data in the “same” splot).
set.seed(123456789)
AllDays_NewTravelDist_10Pct <- filter(AllDays_NewTravelDist,
!is.na(TravelDistance_Mi_NewHvrs) &
!is.na(TravelDistance_Mi)
) %>%
rename(DistMethod = TravelDistance_Mi_NewHvrs_Label) %>%
sample_frac(0.1)
TravDist_MiVsCalc <- ggplot(select(AllDays_NewTravelDist_10Pct,
TravelDistance_Mi_NewHvrs,
TravelDistance_Mi,
DistMethod
),
aes(x = TravelDistance_Mi,
y = TravelDistance_Mi_NewHvrs,
colour = DistMethod
)
) +
scale_colour_manual(values = c("red","blue", "green", "orange", "black")
) +
geom_point(shape = 1, alpha = 0.5) +
scale_shape(solid = FALSE) +
geom_smooth(method = "lm", colour = "blue") +
geom_abline(intercept = 0, slope = 1, colour = "red") +
coord_cartesian(xlim = c(0, 1.5), ylim = c(0, 1.5)
) +
scale_x_continuous(breaks = seq(0, 1.5, 0.25)
) +
scale_y_continuous(breaks = seq(0, 1.5, 0.25)
) +
theme(legend.position = "bottom", #c(0.85, 0.40),
legend.text = element_text(size = 6)
) +
annotate(label = lm_eqn(df = AllDays_NewTravelDist_10Pct,
x = AllDays_NewTravelDist_10Pct$TravelDistance_Mi,
y = AllDays_NewTravelDist_10Pct$TravelDistance_Mi_NewHvrs
),
# x = 62,
# y = 20,
x = 0.70,
y = 0.00,
geom = "text",
size = 3,
colour = "blue",
parse = TRUE
) +
annotate(label = "Reference Line (slope = 1)",
# x = 16,
# y = 30,
x = 0.80,
y = 1.05,
geom = "text",
size = 3,
colour = "red"
) +
labs(title = "TravelDistance_Mi vs. TravelDistance_Mi_NewHvrs",
x = "TravelDistance_Mi",
y = "TravelDistance_Mi_NewHvrs"
)
# +
# geom_jitter()
TravDist_MiVsCalc
Investigation of TravelDistance_Mi & TravelDistance_Mi_Hvrs & TravelDistance_Mi_New.
Graphing test with rbokeh.
TravDist_MiVsCalc_Bokeh <- figure(data = select(AllDays_NewTravelDist_10Pct,
TravelDistance_Mi_NewHvrs,
TravelDistance_Mi,
DistMethod
),
xlim = c(0, 1.5),
ylim = c(0, 1.5),
legend_location = "bottom_right"
) %>%
ly_points(x = TravelDistance_Mi,
y = TravelDistance_Mi_NewHvrs,
color = DistMethod,
hover = c(TravelDistance_Mi_NewHvrs, TravelDistance_Mi, DistMethod)
) %>%
ly_abline(a = 0, b = 1, color = "red")
TravDist_MiVsCalc_Bokeh